Hi Andreas,great that you got through. The sources are by construction executed on the submission machine. So make sure that all the operations which depend on the locally accessible resources (file reading, db query...) are executed in the source (no lazy operation).
The executionContext is used to pilot the side effects. It provides an output stream, in case you need to display something and a file path relativizer which is used in case some openmole execution manager wants to relocate all file path. By default the relativizer does nothing and the output stream points to System.out. In the GUI the output stream will be redirected to the display aera associated to the workflow execution. We have no use for the relativizer anymore but I kept it in case it might be of some use in the future. In the future we could also imagine some mechanism in the execution for rewritting the database connection query and other mechanism to be able to control the context of the side effects....
To enable the file relativizer for your source: val expandedPath = executionContext.relativise(path.from(context)) cheers, Romain Le 09/06/2015 08:25, Andreas Schuh a écrit :
In the meantime I’ve implemented some Source’s for this. The one question whose answer remains unclear to me whether I should use the ExecutionContext and what for. My DataSetSource as outlined in the previous email is currently implemented as follows, which works fine in the LocalEnvironment. Yet have to test it in an actual distributed environment: object DataSetSource { def apply(setId: Prototype[String], dataSet: Prototype[DataSet]) = new SourceBuilder { addInput(setId) addOutput(dataSet) def toSource = new DataSetSource(setId, dataSet) with Built } } abstract class DataSetSource(setId: Prototype[String], dataSet: Prototype[DataSet]) extends Source { override def process(context: Context, executionContext: ExecutionContext)(implicit rng: RandomProvider) = { val name = context.option(setId).get Variable(dataSet, DataSet(name)) } }On 8 Jun 2015, at 20:06, Andreas Schuh <[email protected]> wrote: Hi, I had a look at how data sources are implemented in OpenMOLE, but would like some more background information before I attempt to write my own for my REPEAT workflow plugin. Basically what I currently have is a workflow with some IDs identifying for instance the image dataset, the registration method, and the set of parameters for this registration method which correspond to a row in a CSV table. Most of the information needed by the ScalaTask’s is made available by the user via a HOCON configuration file. My plugin contains classes and objects for easy access to the parsed configuration values. For example, for a specific image dataset to be used, I have a class such as object Dataset { val names: Set[String] } class Dataset(val id: String) { val dir= … def imgCsv = ... def imgPath(imgId: String) = … // ... } A typical workflow then starts with an ExplorationTask which samples all the dataset IDs: val setId = Val[String] val exploreDataSets = ExplorationTask(setId in Dataset.names) After the exploration transition, I want to inject a Val[Dataset] into the workflow that is then used as input to a task which therefore has access to all the information about the dataset via the respective Dataset class instance: val dataSet = Val[Dataset] val getDataSet = Capsule(ScalaTask(“val dataSet = Dataset(setId)”) set (inputs += setId, outputs += dataSet), strainer = true) To not require all the data to be strained through this simple ScalaTask, I tried to use the new “Strain” pattern instead, but realised that this doesn’t work because the newly injected dataSet variable is then only available in one branch of the “Strain” puzzle. Maybe this is still an issue of this pattern… on the other hand, it looks like a Source would be more suitable for what I want to do ? val imgId = Val[String] val exploreImages = ExplorationTask(CSVSampling(“${dataSet.imgCsv}”) set (columns += (“ID”, imgId))) val processImage = ScalaTask(“ … dataSet.imgPath(imgId) “) set (inputs += (dataSet, imgId))) val ex = exploreDataSets -< getDataSet — exploreImages -< processImage start Note: I am using here my modified (hacked) CSVSampling which takes an ExpandedString as argument instead of File. With a custom DatasetSource, I would instead have something like: val getDataSetUsingSource = Capsule(EmptyTask() set (inputs += setId, outputs += dataSet)) source DatasetSource(setId, dataSet) Any suggestions on how to best inject the “Dataset” variable into the workflow ? Using a ScalaTask or a Source ? Note that instantiating this class requires information from a local HOCON configuration file whose content I currently insert as string literal into the getDataSet ScalaTask script. The DatasetSource instance could have access to the com.typesafe.config.Config object of my loaded plugin with the already parsed information. Thanks, Andreas_______________________________________________ OpenMOLE-users mailing list [email protected] http://fedex.iscpif.fr/mailman/listinfo/openmole-users
smime.p7s
Description: Signature cryptographique S/MIME
_______________________________________________ OpenMOLE-users mailing list [email protected] http://fedex.iscpif.fr/mailman/listinfo/openmole-users
