In the meantime I’ve implemented some Source’s for this.
The one question whose answer remains unclear to me whether I should use the
ExecutionContext and what for.
My DataSetSource as outlined in the previous email is currently implemented as
follows, which works fine in the LocalEnvironment. Yet have to test it in an
actual distributed environment:
object DataSetSource {
def apply(setId: Prototype[String], dataSet: Prototype[DataSet]) =
new SourceBuilder {
addInput(setId)
addOutput(dataSet)
def toSource = new DataSetSource(setId, dataSet) with Built
}
}
abstract class DataSetSource(setId: Prototype[String], dataSet:
Prototype[DataSet]) extends Source {
override def process(context: Context, executionContext:
ExecutionContext)(implicit rng: RandomProvider) = {
val name = context.option(setId).get
Variable(dataSet, DataSet(name))
}
}
> On 8 Jun 2015, at 20:06, Andreas Schuh <[email protected]> wrote:
>
> Hi,
>
> I had a look at how data sources are implemented in OpenMOLE, but would like
> some more background information before I attempt to write my own for my
> REPEAT workflow plugin. Basically what I currently have is a workflow with
> some IDs identifying for instance the image dataset, the registration method,
> and the set of parameters for this registration method which correspond to a
> row in a CSV table. Most of the information needed by the ScalaTask’s is made
> available by the user via a HOCON configuration file. My plugin contains
> classes and objects for easy access to the parsed configuration values. For
> example, for a specific image dataset to be used, I have a class such as
>
> object Dataset {
> val names: Set[String]
> }
>
> class Dataset(val id: String) {
> val dir= …
> def imgCsv = ...
> def imgPath(imgId: String) = …
> // ...
> }
>
> A typical workflow then starts with an ExplorationTask which samples all the
> dataset IDs:
>
> val setId = Val[String]
> val exploreDataSets = ExplorationTask(setId in Dataset.names)
>
> After the exploration transition, I want to inject a Val[Dataset] into the
> workflow that is then used as input to a task which therefore has access to
> all the information about the dataset via the respective Dataset class
> instance:
>
> val dataSet = Val[Dataset]
> val getDataSet = Capsule(ScalaTask(“val dataSet = Dataset(setId)”) set
> (inputs += setId, outputs += dataSet), strainer = true)
>
> To not require all the data to be strained through this simple ScalaTask, I
> tried to use the new “Strain” pattern instead, but realised that this doesn’t
> work because the newly injected dataSet variable is then only available in
> one branch of the “Strain” puzzle. Maybe this is still an issue of this
> pattern… on the other hand, it looks like a Source would be more suitable for
> what I want to do ?
>
> val imgId = Val[String]
> val exploreImages = ExplorationTask(CSVSampling(“${dataSet.imgCsv}”) set
> (columns += (“ID”, imgId)))
> val processImage = ScalaTask(“ … dataSet.imgPath(imgId) “) set (inputs +=
> (dataSet, imgId)))
>
> val ex = exploreDataSets -< getDataSet — exploreImages -< processImage start
>
> Note: I am using here my modified (hacked) CSVSampling which takes an
> ExpandedString as argument instead of File.
>
>
> With a custom DatasetSource, I would instead have something like:
>
> val getDataSetUsingSource = Capsule(EmptyTask() set (inputs += setId, outputs
> += dataSet)) source DatasetSource(setId, dataSet)
>
>
> Any suggestions on how to best inject the “Dataset” variable into the
> workflow ? Using a ScalaTask or a Source ? Note that instantiating this class
> requires information from a local HOCON configuration file whose content I
> currently insert as string literal into the getDataSet ScalaTask script. The
> DatasetSource instance could have access to the com.typesafe.config.Config
> object of my loaded plugin with the already parsed information.
>
> Thanks,
>
> Andreas
_______________________________________________
OpenMOLE-users mailing list
[email protected]
http://fedex.iscpif.fr/mailman/listinfo/openmole-users