Re: [OpenMOLE-users] How to implement own data Source?

Andreas Schuh Mon, 08 Jun 2015 19:20:31 -0700

Thanks, Romain, for the explanations. I’ve seen the use of the relativizer in 
the FileSource, but given that I assumed a source would be executed on the 
submission machine, this was a bit confusing… besides that it’s much cleaner 
this way, my approach so far to use a ScalaTask running in the local 
environment probably had the same effect.


I also think it might be a bit more “efficient” with the use of pre-compiled 
sources because the (many) ScalaTask scripts need to (re-)compiled every time 
first before the workflow executes.

Every time I extend/refactor my plugin, I learn something new and how to 
properly use OpenMOLE’s features… :D

Cheers,
Andreas

> On 9 Jun 2015, at 03:05, Romain Reuillon <[email protected]> wrote:
> 
> Hi Andreas,
> 
> great that you got through. The sources are by construction executed on the 
> submission machine. So make sure that all the operations which depend on the 
> locally accessible resources (file reading, db query...) are executed in the 
> source (no lazy operation).
> 
> The executionContext is used to pilot the side effects. It provides an output 
> stream, in case you need to display something and a file path relativizer 
> which is used in case some openmole execution manager wants to relocate all 
> file path. By default the relativizer does nothing and the output stream 
> points to System.out. In the GUI the output stream will be redirected to the 
> display aera associated to the workflow execution. We have no use for the 
> relativizer anymore but I kept it in case it might be of some use in the 
> future. In the future we could also imagine some mechanism in the execution 
> for rewritting the database connection query and other mechanism to be able 
> to control the context of the side effects....
> 
> To enable the file relativizer for your source:
> val expandedPath = executionContext.relativise(path.from(context))
> 
> cheers,
> Romain
> 
> Le 09/06/2015 08:25, Andreas Schuh a écrit :
>> In the meantime I’ve implemented some Source’s for this.
>> 
>> The one question whose answer remains unclear to me whether I should use the 
>> ExecutionContext and what for.
>> 
>> My DataSetSource as outlined in the previous email is currently implemented 
>> as follows, which works fine in the LocalEnvironment. Yet have to test it in 
>> an actual distributed environment:
>> 
>> object DataSetSource {
>>   def apply(setId: Prototype[String], dataSet: Prototype[DataSet]) =
>>     new SourceBuilder {
>>       addInput(setId)
>>       addOutput(dataSet)
>>       def toSource = new DataSetSource(setId, dataSet) with Built
>>     }
>> }
>> 
>> abstract class DataSetSource(setId: Prototype[String], dataSet: 
>> Prototype[DataSet]) extends Source {
>>   override def process(context: Context, executionContext: 
>> ExecutionContext)(implicit rng: RandomProvider) = {
>>     val name = context.option(setId).get
>>     Variable(dataSet, DataSet(name))
>>   }
>> }
>> 
>>> On 8 Jun 2015, at 20:06, Andreas Schuh <[email protected]> wrote:
>>> 
>>> Hi,
>>> 
>>> I had a look at how data sources are implemented in OpenMOLE, but would 
>>> like some more background information before I attempt to write my own for 
>>> my REPEAT workflow plugin. Basically what I currently have is a workflow 
>>> with some IDs identifying for instance the image dataset, the registration 
>>> method, and the set of parameters for this registration method which 
>>> correspond to a row in a CSV table. Most of the information needed by the 
>>> ScalaTask’s is made available by the user via a HOCON configuration file. 
>>> My plugin contains classes and objects for easy access to the parsed 
>>> configuration values. For example, for a specific image dataset to be used, 
>>> I have a class such as
>>> 
>>> object Dataset {
>>>  val names: Set[String]
>>> }
>>> 
>>> class Dataset(val id: String) {
>>>  val dir= …
>>>  def imgCsv = ...
>>>  def imgPath(imgId: String) = …
>>>  // ...
>>> }
>>> 
>>> A typical workflow then starts with an ExplorationTask which samples all 
>>> the dataset IDs:
>>> 
>>> val setId = Val[String]
>>> val exploreDataSets = ExplorationTask(setId in Dataset.names)
>>> 
>>> After the exploration transition, I want to inject a Val[Dataset] into the 
>>> workflow that is then used as input to a task which therefore has access to 
>>> all the information about the dataset via the respective Dataset class 
>>> instance:
>>> 
>>> val dataSet = Val[Dataset]
>>> val getDataSet = Capsule(ScalaTask(“val dataSet = Dataset(setId)”) set 
>>> (inputs += setId, outputs += dataSet), strainer = true)
>>> 
>>> To not require all the data to be strained through this simple ScalaTask, I 
>>> tried to use the new “Strain” pattern instead, but realised that this 
>>> doesn’t work because the newly injected dataSet variable is then only 
>>> available in one branch of the “Strain” puzzle. Maybe this is still an 
>>> issue of this pattern… on the other hand, it looks like a Source would be 
>>> more suitable for what I want to do ?
>>> 
>>> val imgId = Val[String]
>>> val exploreImages = ExplorationTask(CSVSampling(“${dataSet.imgCsv}”) set 
>>> (columns += (“ID”, imgId)))
>>> val processImage = ScalaTask(“ … dataSet.imgPath(imgId) “) set (inputs += 
>>> (dataSet, imgId)))
>>> 
>>> val ex = exploreDataSets -< getDataSet — exploreImages -< processImage start
>>> 
>>> Note: I am using here my modified (hacked) CSVSampling which takes an 
>>> ExpandedString as argument instead of File.
>>> 
>>> 
>>> With a custom DatasetSource, I would instead have something like:
>>> 
>>> val getDataSetUsingSource = Capsule(EmptyTask() set (inputs += setId, 
>>> outputs += dataSet)) source DatasetSource(setId, dataSet)
>>> 
>>> 
>>> Any suggestions on how to best inject the “Dataset” variable into the 
>>> workflow ? Using a ScalaTask or a Source ? Note that instantiating this 
>>> class requires information from a local HOCON configuration file whose 
>>> content I currently insert as string literal into the getDataSet ScalaTask 
>>> script. The DatasetSource instance could have access to the 
>>> com.typesafe.config.Config object of my loaded plugin with the already 
>>> parsed information.
>>> 
>>> Thanks,
>>> 
>>> Andreas
>> 
>> _______________________________________________
>> OpenMOLE-users mailing list
>> [email protected]
>> http://fedex.iscpif.fr/mailman/listinfo/openmole-users
> 
> 


_______________________________________________
OpenMOLE-users mailing list
[email protected]
http://fedex.iscpif.fr/mailman/listinfo/openmole-users

Re: [OpenMOLE-users] How to implement own data Source?

Reply via email to