Hi Pascal,

by default a file produced by a task and used by the subsequent one
travels through the orchestration machine, such as in:

/val f: Val[File]//
//
//val t1 = Task(...) set (output += f)//
//val t2 = Task(...) set (input += f)//
//
val env = Cluster(...)

//(t1 on env) -- (t2 on env)/

In this workflow, the file f travels back and forth from the cluster to
the submission machine and then to the submission machine to the cluster.

To avoid that (in case your files are huge) they are several
optimisation, the cleanest one woud be to define a task t3:
/
//val t3 = MoleTask(t1 -- t2)//

t3 on env
/
and then to delegate t3 to the cluster. In this case the workflow t1 --
t2 would be executed on the cluster and the file will stay on the shared
file system (NFS for instance).

This will work only for temporarry files (not for injecting huge input
file in the dataflow and not for storing huge output files from the
dataflow).

In case you want to achieve that also, you will need a task to directly
access the filesystem. However you will loose workflow portability and
the workflow would probably work only on a given cluster.

cheers,
Romain

Le 19/10/2015 11:06, Pascal Gillet a écrit :
> Hi,
>
> I recently discovered OpenMOLE and a question arises to me :
>
> *How do you manage tools that read/write files on a local file system ?*
>
> We tried to adapt legacy image processing tools to work in a
> distributed environment (Hadoop/MapReduce with HDFS).
> Those tools usually follow an internal workflow and save their current
> state into files rather than in memory.
> We cut the processing into different map/reduce steps in an Oozie
> workflow, and from there, we may adopt two strategies:
>
> - The tools are developed in Java and we can adapt the code so as to
> write the file directly into HDFS instead of the local file system.
>
> - Another strategy is to push the local result file from each
> processing step into HDFS, so as to make it available to the next step
> in the workflow that might run on another node in the cluster.
>
> Regards,
>
> Pascal GILLET
>  
>
>
>
>
>
> _______________________________________________
> OpenMOLE-users mailing list
> [email protected]
> http://fedex.iscpif.fr/mailman/listinfo/openmole-users

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
OpenMOLE-users mailing list
[email protected]
http://fedex.iscpif.fr/mailman/listinfo/openmole-users

Reply via email to