[OpenMOLE-users] How to implement own data Source?

Andreas Schuh Mon, 08 Jun 2015 12:08:29 -0700

Hi,

I had a look at how data sources are implemented in OpenMOLE, but would like 
some more background information before I attempt to write my own for my REPEAT 
workflow plugin. Basically what I currently have is a workflow with some IDs 
identifying for instance the image dataset, the registration method, and the 
set of parameters for this registration method which correspond to a row in a 
CSV table. Most of the information needed by the ScalaTask’s is made available 
by the user via a HOCON configuration file. My plugin contains classes and 
objects for easy access to the parsed configuration values. For example, for a 
specific image dataset to be used, I have a class such as


object Dataset {
  val names: Set[String]
}

class Dataset(val id: String) {
  val dir= …
  def imgCsv = ...
  def imgPath(imgId: String) = …
  // ...
}

A typical workflow then starts with an ExplorationTask which samples all the 
dataset IDs:

val setId = Val[String]
val exploreDataSets = ExplorationTask(setId in Dataset.names)

After the exploration transition, I want to inject a Val[Dataset] into the 
workflow that is then used as input to a task which therefore has access to all 
the information about the dataset via the respective Dataset class instance:

val dataSet = Val[Dataset]
val getDataSet = Capsule(ScalaTask(“val dataSet = Dataset(setId)”) set (inputs 
+= setId, outputs += dataSet), strainer = true)

To not require all the data to be strained through this simple ScalaTask, I 
tried to use the new “Strain” pattern instead, but realised that this doesn’t 
work because the newly injected dataSet variable is then only available in one 
branch of the “Strain” puzzle. Maybe this is still an issue of this pattern… on 
the other hand, it looks like a Source would be more suitable for what I want 
to do ?

val imgId = Val[String]
val exploreImages = ExplorationTask(CSVSampling(“${dataSet.imgCsv}”) set 
(columns += (“ID”, imgId)))
val processImage = ScalaTask(“ … dataSet.imgPath(imgId) “) set (inputs += 
(dataSet, imgId)))

val ex = exploreDataSets -< getDataSet — exploreImages -< processImage start

Note: I am using here my modified (hacked) CSVSampling which takes an 
ExpandedString as argument instead of File.


With a custom DatasetSource, I would instead have something like:

val getDataSetUsingSource = Capsule(EmptyTask() set (inputs += setId, outputs 
+= dataSet)) source DatasetSource(setId, dataSet)


Any suggestions on how to best inject the “Dataset” variable into the workflow 
? Using a ScalaTask or a Source ? Note that instantiating this class requires 
information from a local HOCON configuration file whose content I currently 
insert as string literal into the getDataSet ScalaTask script. The 
DatasetSource instance could have access to the com.typesafe.config.Config 
object of my loaded plugin with the already parsed information.

Thanks,

Andreas
_______________________________________________
OpenMOLE-users mailing list
[email protected]
http://fedex.iscpif.fr/mailman/listinfo/openmole-users

[OpenMOLE-users] How to implement own data Source?

Reply via email to