Hi,
I had a look at how data sources are implemented in OpenMOLE, but would like
some more background information before I attempt to write my own for my REPEAT
workflow plugin. Basically what I currently have is a workflow with some IDs
identifying for instance the image dataset, the registration method, and the
set of parameters for this registration method which correspond to a row in a
CSV table. Most of the information needed by the ScalaTask’s is made available
by the user via a HOCON configuration file. My plugin contains classes and
objects for easy access to the parsed configuration values. For example, for a
specific image dataset to be used, I have a class such as
object Dataset {
val names: Set[String]
}
class Dataset(val id: String) {
val dir= …
def imgCsv = ...
def imgPath(imgId: String) = …
// ...
}
A typical workflow then starts with an ExplorationTask which samples all the
dataset IDs:
val setId = Val[String]
val exploreDataSets = ExplorationTask(setId in Dataset.names)
After the exploration transition, I want to inject a Val[Dataset] into the
workflow that is then used as input to a task which therefore has access to all
the information about the dataset via the respective Dataset class instance:
val dataSet = Val[Dataset]
val getDataSet = Capsule(ScalaTask(“val dataSet = Dataset(setId)”) set (inputs
+= setId, outputs += dataSet), strainer = true)
To not require all the data to be strained through this simple ScalaTask, I
tried to use the new “Strain” pattern instead, but realised that this doesn’t
work because the newly injected dataSet variable is then only available in one
branch of the “Strain” puzzle. Maybe this is still an issue of this pattern… on
the other hand, it looks like a Source would be more suitable for what I want
to do ?
val imgId = Val[String]
val exploreImages = ExplorationTask(CSVSampling(“${dataSet.imgCsv}”) set
(columns += (“ID”, imgId)))
val processImage = ScalaTask(“ … dataSet.imgPath(imgId) “) set (inputs +=
(dataSet, imgId)))
val ex = exploreDataSets -< getDataSet — exploreImages -< processImage start
Note: I am using here my modified (hacked) CSVSampling which takes an
ExpandedString as argument instead of File.
With a custom DatasetSource, I would instead have something like:
val getDataSetUsingSource = Capsule(EmptyTask() set (inputs += setId, outputs
+= dataSet)) source DatasetSource(setId, dataSet)
Any suggestions on how to best inject the “Dataset” variable into the workflow
? Using a ScalaTask or a Source ? Note that instantiating this class requires
information from a local HOCON configuration file whose content I currently
insert as string literal into the getDataSet ScalaTask script. The
DatasetSource instance could have access to the com.typesafe.config.Config
object of my loaded plugin with the already parsed information.
Thanks,
Andreas
_______________________________________________
OpenMOLE-users mailing list
[email protected]
http://fedex.iscpif.fr/mailman/listinfo/openmole-users