> On 1 May 2015, at 07:16, Romain Reuillon <[email protected]> wrote:
> 
> Thank for the profiling it is very interesting. The behaviour of OpenMOLE is 
> that if a file is an output of the task it will be copied back from the 
> execution environment to the local machine, also if the task fails the 
> context error contains the input context and therefore it contains input 
> files.

Hm, so this copying of output files is then independent of the CopyFileHook, 
which I basically only need to copy the *local* copy of the output file from 
the temporary directories that only exist during workflow execution to some 
other local path ? (I was already wondering how the “to.copy(from)” in 
CopyFileHook would work when the files are on different machines, couldn’t find 
the magic… which apparently isn’t there)

Then what is the purpose of the “link” option of “addResource” and 
“addInputFile” of ExternalTaskBuilder ? I thought I could use it to prevent any 
copy of input and output files by OpenMOLE but instead instruct it to use 
symbolic links to my files which I know are located on a shared NFS drive as we 
also discussed some days ago here: [OpenMOLE-users] CARE and SystemExecTask 
<http://fedex.iscpif.fr/pipermail/openmole-users/2015-April/000647.html> .

It seems to me now that no matter what I do, OpenMOLE will copy the input files 
to a temporary directory within ~/.openmole and also the output files from the 
task working directory to this temporary directory. Is there really no way to 
force it to use the original file paths instead ? In image processing, the sum 
of data files processed will easily add up to several 100 GB. That’s quite a 
lot of unnecessary traffic and an unpleasant runtime overhead of copying image 
data around (even my ~/.openmole directory is located on a NFS drive that is 
accessible by all SLURM/Condor compute nodes).

> 
> Le 01/05/2015 03:12, Andreas Schuh a écrit :
>> Hi,
>> 
>> I am trying to setup a workflow for execution on a cluster where each 
>> compute node has access to the shared data directory for input and output 
>> files via NFS. When running on Condor, I noticed the following files in the 
>> .openmole directory:
>> 
>> total 23M
>> -rw-r--r-- 1 as12312 vip  511 May  1 01:48 
>> f14f6be2-ea76-41aa-b714-f04766a2781b.condor
>> -rw-r--r-- 1 as12312 vip  39K May  1 01:49 
>> f14f6be2-ea76-41aa-b714-f04766a2781b.err
>> -rw-r--r-- 1 as12312 vip    0 May  1 01:48 
>> f14f6be2-ea76-41aa-b714-f04766a2781b.out
>> -rw-r--r-- 1 as12312 vip 2.5K May  1 01:48 
>> job_2d5f861f-430f-4ee3-9ae1-cd1f435c1c7d.in
>> -rw-r--r-- 1 as12312 vip 9.9K May  1 01:48 
>> job_6f09ff72-1707-46bf-b54b-eb5a7d79c298.tgz
>> -rw-r--r-- 1 as12312 vip 1.8K May  1 01:50 
>> output_a6476ae9-fa21-4695-8ba3-81f034388077.txt
>> -rw-r--r-- 1 as12312 vip  557 May  1 01:50 
>> result_2d220ac0-38ec-4213-9d9b-366fc50a01b0.xml.gz
>> -rw-r--r-- 1 as12312 vip 1.5K May  1 01:48 
>> run_09ccc83c-3695-4720-b295-b6d55d627ff7.sh
>> -rw-r--r-- 1 as12312 vip  23M May  1 01:50 
>> uplodedTar_5a736889-01e4-4ea7-bf0a-3225c8ebd659.tgz
>> 
>> As can be seen, the uploadedTar_[…].tgz file is rather large considering 
>> that all input/output files are accessible via NFS. Looking at the content 
>> of the archive (files/filesInfo.xml) suggests that it contains the 3D NIfTI 
>> volume image files.
>> 
>> Why are these files even archived and uploaded to the remote when I use the 
>> “link = true” option of “inputFiles” ?
>> 
>> Andreas
>> 
>> 
>> P.S.: For reference, here the semi-complete workflow:
>> 
>> val dofPath = join(dofRig,        dofPre + refId + s",$${${srcId.name}}" + 
>> dofSuf).getAbsolutePath
>>     val logPath = join(logDir, dofRig.getName, refId + s",$${${srcId.name}}" 
>> + logSuf).getAbsolutePath
>> 
>>     val dofRelPath = relativize(Workspace.rootFS, dofPath)
>>     val logRelPath = relativize(Workspace.rootFS, logPath)
>> 
>>     val begin = EmptyTask() set (
>>         name    := "ComputeRigidTemplateDofsBegin",
>>         inputs  += (refIm, srcId, srcIm),
>>         outputs += (refIm, srcId, srcIm, dof)
>>       ) source FileSource(dofPath, dof)
>> 
>>     val regTask = ScalaTask(
>>       s"""
>>         | Config.parse(\"\"\"${Config()}\"\"\", "${Config().base}")
>>         | val ${refIm.name} = FileUtil.join(workDir, "$refId$refSuf")
>>         | val ${srcIm.name} = FileUtil.join(workDir, "$imgPre" + srcId + 
>> "$imgSuf")
>>         | val ${dof.name}   = FileUtil.join(workDir, "rootfs", 
>> s"$dofRelPath")
>>         | val ${log.name}   = FileUtil.join(workDir, "rootfs", 
>> s"$logRelPath")
>>         | IRTK.ireg(${refIm.name}, ${srcIm.name}, None, ${dof.name}, 
>> Some(${log.name}),
>>         |   "Transformation model" -> "Rigid",
>>         |   "Background value" -> $bgVal
>>         | )
>>       """.stripMargin) set (
>>         name        := "ComputeRigidTemplateDofs",
>>         imports     += ("com.andreasschuh.repeat.core.{Config, FileUtil, 
>> IRTK}", "sys.process._"),
>>         usedClasses += (Config.getClass, FileUtil.getClass, IRTK.getClass),
>>         inputs      += srcId,
>>         inputFiles  += (refIm, refId + refSuf, Workspace.shared),
>>         inputFiles  += (srcIm, imgPre + "${srcId}" + imgSuf, 
>> Workspace.shared),
>>         outputs     += (refIm, srcId, srcIm),
>>         outputFiles += (join("rootfs", dofRelPath), dof),
>>         outputFiles += (join("rootfs", logRelPath), log)
>>       )
>> 
>>     // If workspace is accessible by compute node, read/write files directly 
>> without copy
>>     if (Workspace.shared) {
>>       Workspace.rootFS.mkdirs()
>>       regTask.addResource(Workspace.rootFS, "rootfs", link = true, inWorkDir 
>> = true)
>>     }
>> 
>>     // Otherwise, output files have to be copied to local workspace if not 
>> shared
>>     val reg = regTask hook (
>>         CopyFileHook(dof, dofPath),
>>         CopyFileHook(log, logPath)
>>       )
>> 
>>     val cond1 = s"${dof.name}.lastModified() > ${refIm.name}.lastModified()"
>>     val cond2 = s"${dof.name}.lastModified() > ${srcIm.name}.lastModified()"
>>     begin -- Skip(reg on Env.short by 10, cond1 + " && " + cond2)
>> _______________________________________________
>> OpenMOLE-users mailing list
>> [email protected]
>> http://fedex.iscpif.fr/mailman/listinfo/openmole-users
> 
> 

_______________________________________________
OpenMOLE-users mailing list
[email protected]
http://fedex.iscpif.fr/mailman/listinfo/openmole-users

Reply via email to