> On 1 May 2015, at 11:08, Romain Reuillon <[email protected]> wrote: > > I think it is easy enough to implement. I'll give it a shot today. However I > won't be able to test/debug it because I have no access to such an > environment. Could you take care of that?
That would be great as I was hoping to finally be able to run my tasks to get actual results… it’s been 1 month now developing the OpenMOLE workflow :( I’ll be happy to test it in our environment. I have access to our lab dedicated SLURM cluster and the department HTCondor setup. I could also try it on our college HPC which uses SGE and shared storage. I also agree that these options should be part of the environment specification. > > I basically agree with you for the file in ~/.openmole: file are transfered > to the node through the shared FS. So it has to be copied here. What could be > optimized, is the temporary dir location of execution for task. It is also > created in this folder and therefore on the sharded FS, which is not actually > requiered. This workdir could be optionnaly relocated somewhere using an > environment parameter. > Not sure if I follow this solution outline, but I’m sure you have a better idea of how things are working right now and need to be modified. Why do files have to be copied to ~/.openmole when the original input files to the workflow (exploration SelectFileDomain), is already located in a shared FS ? That the location of the local and remote temporary directory location can be configured via environment variable would solve the second issue of where temporary files such as wrapper scripts and remote resources are located. The first issue is how to deal with input and output files of tasks which are located on a shared FS already and thus should not require a copy to the temporary directories. > Le 01/05/2015 10:50, Andreas Schuh a écrit : >> >>> On 1 May 2015, at 09:31, Romain Reuillon <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Do you submit your jobs on a machines which is on the same shared NFS >>> storage as the cluster? >> >> Yes, all machines in our department have access to the same NFS storage. >> >>> >>> If yes I would advide to implement an option available for the cluster >>> environments which make them use a local storage that creates logical links >>> instead of copying the wall files through ssh. Do you think that would be >>> an acceptable solution ? I think this solution would be easy to implement, >>> the only touchy part would be to handle correctly the temporary files which >>> are created by OpenMOLE and then deleted. >> >> That would be great, to have an option to let OpenMOLE know not to copy the >> files from the user input path to the temporary directory. As it is at the >> moment it’s a real deal breaker for most common cluster environments. What >> OpenMOLE does seems only needed for a grid or a simple SSH based setup. >> >> Temporary files should probably still be written to the ~/.openmole >> directory and cleaned up there. Actually, it would be good if OpenMOLE could >> be configured to write such files to a /tmp directory on the local >> machine/compute node only. In the labs I worked in so far, the home >> directory was always shared via NFS. Too much NFS traffic in this >> SGE environment would cause serious troubles for the NFS servers when many >> people are running thousands of jobs with heavy I/O. In that case it should >> still be an option to be able to write output files to a local non-NFS >> directory and copy the final results only. … Just something to keep in mind, >> but probably a different issue. >> >>> >>> If not I think you should transmit to you execution task a string >>> containing the file on the NFS storage it should process instead of a file >>> object. >> >> I was thinking of doing that in the beginning, but I wanted a solution that >> works for both cluster and grid environments. >> >> My understanding of OpenMOLE was that it allows me to specify the data flow >> and task dependencies in an abstract way independent of the particular >> execution environment. If I have to specialise my workflow towards a >> specific environment again, it’s not as abstract any more as I was hoping >> for and requires to explicitly deal with different environments myself. >> >> It would be much better if OpenMOLE was smart enough to know how to most >> efficiently deal with each environment or be configurable by the user. For >> my workflow, for example, I let the user specify in a configuration file >> whether the input dataset is located on a shared directory, whether the >> workflow is, and whether the binaries are. >> >> In summary: >> >> If the input dataset is not shared, files are copied to the shared workspace >> (if it is shared). >> If the workspace (i.e., storage for output files) is not shared, data files >> are copied to and from the compute nodes. >> If the binaries are shared, they are called directly from the shared >> workspace directory or installation. Otherwise, I would pack them using CARE >> into the shared workspace. If the workspace is not shared, pack them with >> CARE and copy them to the compute nodes… >> >> >> I’ve wrote down my intention of how to realise my REPEAT workflow a couple >> of days ago as follows. >> >> if Dataset.shared || Environment.localOnly >> if Workspace.shared || Environment.localOnly >> if Software.shared || Environment.localOnly >> Task executes binary file(s) of shared installation >> else // !Software.shared >> All binary files are "packed" using CARE into workspace.dir >> Task executes binary file(s) of shared CARE archive using PRoot >> Dataset files are read directly from dataset.dir >> Other input files are read directly from workspace.dir/rootfs >> Output files are written directly to workspace.dir/rootfs >> Task workDir/rootfs is a symbolic link resource to workspace.dir/rootfs >> File input paths are made relative to workspace.dir/rootfs >> Task accesses files with absolute path workDir/rootfs/relpath >> else // !Workspace.shared >> if Software.shared || Environment.localOnly >> Task executes binary file(s) of shared installation >> else // !Software.shared >> Binary file(s) used by each task are packed into individual CARE archives >> CARE archive is resource inWorkDir of task >> Task executes binary file(s) of unpacked CARE archive using PRoot >> Dataset files are read directly from dataset.dir >> Other input files are copied from workspace.dir/rootfs to workDir/rootfs >> Output files are copied from workDir/rootfs to workspace.dir/rootfs >> File input paths are made relative to rootfs >> Task accesses files with absolute path /relpath >> else // !Dataset.shared >> if Workspace.shared || Environment.localOnly >> if Software.shared || Environment.localOnly >> Task executes binary file(s) of shared installation >> else // !Software.shared >> All binary files are "packed" using CARE into workspace.dir >> Task executes binary file(s) of shared CARE archive using PRoot >> Dataset files are copied to workspace.dir/rootfs >> Output files are written/read from/to workspace.dir/rootfs >> Task workDir/rootfs is a symbolic link resource to workspace.dir/rootfs >> File input paths are made relative to workspace.dir/rootfs >> Task accesses files with absolute path workDir/rootfs/relpath >> else // !Workspace.shared >> [ … ] >> >> >> >>> >>> Le 01/05/2015 09:59, Andreas Schuh a écrit : >>>> >>>>> On 1 May 2015, at 07:16, Romain Reuillon <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> >>>>> Thank for the profiling it is very interesting. The behaviour of OpenMOLE >>>>> is that if a file is an output of the task it will be copied back from >>>>> the execution environment to the local machine, also if the task fails >>>>> the context error contains the input context and therefore it contains >>>>> input files. >>>> >>>> Hm, so this copying of output files is then independent of the >>>> CopyFileHook, which I basically only need to copy the *local* copy of the >>>> output file from the temporary directories that only exist during workflow >>>> execution to some other local path ? (I was already wondering how the >>>> “to.copy(from)” in CopyFileHook would work when the files are on different >>>> machines, couldn’t find the magic… which apparently isn’t there) >>>> >>>> Then what is the purpose of the “link” option of “addResource” and >>>> “addInputFile” of ExternalTaskBuilder ? I thought I could use it to >>>> prevent any copy of input and output files by OpenMOLE but instead >>>> instruct it to use symbolic links to my files which I know are located on >>>> a shared NFS drive as we also discussed some days ago here: >>>> [OpenMOLE-users] CARE and SystemExecTask >>>> <http://fedex.iscpif.fr/pipermail/openmole-users/2015-April/000647.html> . >>>> >>>> It seems to me now that no matter what I do, OpenMOLE will copy the input >>>> files to a temporary directory within ~/.openmole and also the output >>>> files from the task working directory to this temporary directory. Is >>>> there really no way to force it to use the original file paths instead ? >>>> In image processing, the sum of data files processed will easily add up to >>>> several 100 GB. That’s quite a lot of unnecessary traffic and an >>>> unpleasant runtime overhead of copying image data around (even my >>>> ~/.openmole directory is located on a NFS drive that is accessible by all >>>> SLURM/Condor compute nodes). >>>> >>>>> >>>>> Le 01/05/2015 03:12, Andreas Schuh a écrit : >>>>>> Hi, >>>>>> >>>>>> I am trying to setup a workflow for execution on a cluster where each >>>>>> compute node has access to the shared data directory for input and >>>>>> output files via NFS. When running on Condor, I noticed the following >>>>>> files in the .openmole directory: >>>>>> >>>>>> total 23M >>>>>> -rw-r--r-- 1 as12312 vip 511 May 1 01:48 >>>>>> f14f6be2-ea76-41aa-b714-f04766a2781b.condor >>>>>> -rw-r--r-- 1 as12312 vip 39K May 1 01:49 >>>>>> f14f6be2-ea76-41aa-b714-f04766a2781b.err >>>>>> -rw-r--r-- 1 as12312 vip 0 May 1 01:48 >>>>>> f14f6be2-ea76-41aa-b714-f04766a2781b.out >>>>>> -rw-r--r-- 1 as12312 vip 2.5K May 1 01:48 >>>>>> job_2d5f861f-430f-4ee3-9ae1-cd1f435c1c7d.in >>>>>> -rw-r--r-- 1 as12312 vip 9.9K May 1 01:48 >>>>>> job_6f09ff72-1707-46bf-b54b-eb5a7d79c298.tgz >>>>>> -rw-r--r-- 1 as12312 vip 1.8K May 1 01:50 >>>>>> output_a6476ae9-fa21-4695-8ba3-81f034388077.txt >>>>>> -rw-r--r-- 1 as12312 vip 557 May 1 01:50 >>>>>> result_2d220ac0-38ec-4213-9d9b-366fc50a01b0.xml.gz >>>>>> -rw-r--r-- 1 as12312 vip 1.5K May 1 01:48 >>>>>> run_09ccc83c-3695-4720-b295-b6d55d627ff7.sh >>>>>> -rw-r--r-- 1 as12312 vip 23M May 1 01:50 >>>>>> uplodedTar_5a736889-01e4-4ea7-bf0a-3225c8ebd659.tgz >>>>>> >>>>>> As can be seen, the uploadedTar_[…].tgz file is rather large considering >>>>>> that all input/output files are accessible via NFS. Looking at the >>>>>> content of the archive (files/filesInfo.xml) suggests that it contains >>>>>> the 3D NIfTI volume image files. >>>>>> >>>>>> Why are these files even archived and uploaded to the remote when I use >>>>>> the “link = true” option of “inputFiles” ? >>>>>> >>>>>> Andreas >>>>>> >>>>>> >>>>>> P.S.: For reference, here the semi-complete workflow: >>>>>> >>>>>> val dofPath = join(dofRig, dofPre + refId + s",$${${srcId.name}}" >>>>>> + dofSuf).getAbsolutePath >>>>>> val logPath = join(logDir, dofRig.getName, refId + >>>>>> s",$${${srcId.name}}" + logSuf).getAbsolutePath >>>>>> >>>>>> val dofRelPath = relativize(Workspace.rootFS, dofPath) >>>>>> val logRelPath = relativize(Workspace.rootFS, logPath) >>>>>> >>>>>> val begin = EmptyTask() set ( >>>>>> name := "ComputeRigidTemplateDofsBegin", >>>>>> inputs += (refIm, srcId, srcIm), >>>>>> outputs += (refIm, srcId, srcIm, dof) >>>>>> ) source FileSource(dofPath, dof) >>>>>> >>>>>> val regTask = ScalaTask( >>>>>> s""" >>>>>> | Config.parse(\"\"\"${Config()}\"\"\", "${Config().base}") >>>>>> | val ${refIm.name} = FileUtil.join(workDir, "$refId$refSuf") >>>>>> | val ${srcIm.name} = FileUtil.join(workDir, "$imgPre" + srcId + >>>>>> "$imgSuf") >>>>>> | val ${dof.name} = FileUtil.join(workDir, "rootfs", >>>>>> s"$dofRelPath") >>>>>> | val ${log.name} = FileUtil.join(workDir, "rootfs", >>>>>> s"$logRelPath") >>>>>> | IRTK.ireg(${refIm.name}, ${srcIm.name}, None, ${dof.name}, >>>>>> Some(${log.name}), >>>>>> | "Transformation model" -> "Rigid", >>>>>> | "Background value" -> $bgVal >>>>>> | ) >>>>>> """.stripMargin) set ( >>>>>> name := "ComputeRigidTemplateDofs", >>>>>> imports += ("com.andreasschuh.repeat.core.{Config, FileUtil, >>>>>> IRTK}", "sys.process._"), >>>>>> usedClasses += (Config.getClass, FileUtil.getClass, >>>>>> IRTK.getClass), >>>>>> inputs += srcId, >>>>>> inputFiles += (refIm, refId + refSuf, Workspace.shared), >>>>>> inputFiles += (srcIm, imgPre + "${srcId}" + imgSuf, >>>>>> Workspace.shared), >>>>>> outputs += (refIm, srcId, srcIm), >>>>>> outputFiles += (join("rootfs", dofRelPath), dof), >>>>>> outputFiles += (join("rootfs", logRelPath), log) >>>>>> ) >>>>>> >>>>>> // If workspace is accessible by compute node, read/write files >>>>>> directly without copy >>>>>> if (Workspace.shared) { >>>>>> Workspace.rootFS.mkdirs() >>>>>> regTask.addResource(Workspace.rootFS, "rootfs", link = true, >>>>>> inWorkDir = true) >>>>>> } >>>>>> >>>>>> // Otherwise, output files have to be copied to local workspace if >>>>>> not shared >>>>>> val reg = regTask hook ( >>>>>> CopyFileHook(dof, dofPath), >>>>>> CopyFileHook(log, logPath) >>>>>> ) >>>>>> >>>>>> val cond1 = s"${dof.name}.lastModified() > >>>>>> ${refIm.name}.lastModified()" >>>>>> val cond2 = s"${dof.name}.lastModified() > >>>>>> ${srcIm.name}.lastModified()" >>>>>> begin -- Skip(reg on Env.short by 10, cond1 + " && " + cond2) >>>>>> _______________________________________________ >>>>>> OpenMOLE-users mailing list >>>>>> [email protected] <mailto:[email protected]> >>>>>> http://fedex.iscpif.fr/mailman/listinfo/openmole-users >>>>>> <http://fedex.iscpif.fr/mailman/listinfo/openmole-users> >>>>> >>>>> >>>> >>> >> >
_______________________________________________ OpenMOLE-users mailing list [email protected] http://fedex.iscpif.fr/mailman/listinfo/openmole-users
