Re: [OpenMOLE-users] How to implement own data Source?

Romain Reuillon Mon, 08 Jun 2015 19:08:43 -0700

Hi Andreas,

great that you got through. The sources are by construction executed on the submission machine. So make sure that all the operations which depend on the locally accessible resources (file reading, db query...) are executed in the source (no lazy operation).

The executionContext is used to pilot the side effects. It provides an output stream, in case you need to display something and a file path relativizer which is used in case some openmole execution manager wants to relocate all file path. By default the relativizer does nothing and the output stream points to System.out. In the GUI the output stream will be redirected to the display aera associated to the workflow execution. We have no use for the relativizer anymore but I kept it in case it might be of some use in the future. In the future we could also imagine some mechanism in the execution for rewritting the database connection query and other mechanism to be able to control the context of the side effects....


To enable the file relativizer for your source:
val expandedPath = executionContext.relativise(path.from(context))

cheers,
Romain

Le 09/06/2015 08:25, Andreas Schuh a écrit :

In the meantime I’ve implemented some Source’s for this.

The one question whose answer remains unclear to me whether I should use the 
ExecutionContext and what for.

My DataSetSource as outlined in the previous email is currently implemented as 
follows, which works fine in the LocalEnvironment. Yet have to test it in an 
actual distributed environment:

object DataSetSource {
   def apply(setId: Prototype[String], dataSet: Prototype[DataSet]) =
     new SourceBuilder {
       addInput(setId)
       addOutput(dataSet)
       def toSource = new DataSetSource(setId, dataSet) with Built
     }
}

abstract class DataSetSource(setId: Prototype[String], dataSet: 
Prototype[DataSet]) extends Source {
   override def process(context: Context, executionContext: 
ExecutionContext)(implicit rng: RandomProvider) = {
     val name = context.option(setId).get
     Variable(dataSet, DataSet(name))
   }
}

On 8 Jun 2015, at 20:06, Andreas Schuh <[email protected]> wrote:

Hi,

I had a look at how data sources are implemented in OpenMOLE, but would like 
some more background information before I attempt to write my own for my REPEAT 
workflow plugin. Basically what I currently have is a workflow with some IDs 
identifying for instance the image dataset, the registration method, and the 
set of parameters for this registration method which correspond to a row in a 
CSV table. Most of the information needed by the ScalaTask’s is made available 
by the user via a HOCON configuration file. My plugin contains classes and 
objects for easy access to the parsed configuration values. For example, for a 
specific image dataset to be used, I have a class such as

object Dataset {
  val names: Set[String]
}

class Dataset(val id: String) {
  val dir= …
  def imgCsv = ...
  def imgPath(imgId: String) = …
  // ...
}

A typical workflow then starts with an ExplorationTask which samples all the 
dataset IDs:

val setId = Val[String]
val exploreDataSets = ExplorationTask(setId in Dataset.names)

After the exploration transition, I want to inject a Val[Dataset] into the 
workflow that is then used as input to a task which therefore has access to all 
the information about the dataset via the respective Dataset class instance:

val dataSet = Val[Dataset]
val getDataSet = Capsule(ScalaTask(“val dataSet = Dataset(setId)”) set (inputs 
+= setId, outputs += dataSet), strainer = true)

To not require all the data to be strained through this simple ScalaTask, I 
tried to use the new “Strain” pattern instead, but realised that this doesn’t 
work because the newly injected dataSet variable is then only available in one 
branch of the “Strain” puzzle. Maybe this is still an issue of this pattern… on 
the other hand, it looks like a Source would be more suitable for what I want 
to do ?

val imgId = Val[String]
val exploreImages = ExplorationTask(CSVSampling(“${dataSet.imgCsv}”) set 
(columns += (“ID”, imgId)))
val processImage = ScalaTask(“ … dataSet.imgPath(imgId) “) set (inputs += 
(dataSet, imgId)))

val ex = exploreDataSets -< getDataSet — exploreImages -< processImage start

Note: I am using here my modified (hacked) CSVSampling which takes an 
ExpandedString as argument instead of File.


With a custom DatasetSource, I would instead have something like:

val getDataSetUsingSource = Capsule(EmptyTask() set (inputs += setId, outputs 
+= dataSet)) source DatasetSource(setId, dataSet)


Any suggestions on how to best inject the “Dataset” variable into the workflow 
? Using a ScalaTask or a Source ? Note that instantiating this class requires 
information from a local HOCON configuration file whose content I currently 
insert as string literal into the getDataSet ScalaTask script. The 
DatasetSource instance could have access to the com.typesafe.config.Config 
object of my loaded plugin with the already parsed information.

Thanks,

Andreas


_______________________________________________
OpenMOLE-users mailing list
[email protected]
http://fedex.iscpif.fr/mailman/listinfo/openmole-users

smime.p7s
Description: Signature cryptographique S/MIME

_______________________________________________
OpenMOLE-users mailing list
[email protected]
http://fedex.iscpif.fr/mailman/listinfo/openmole-users

Re: [OpenMOLE-users] How to implement own data Source?

Reply via email to