Hi,

Am 03.01.2012 um 19:20 schrieb Fernanda Foertter:

>       I have this complicated job sequence that requires the "spawning" of 
> many other jobs related to the original one.  It's a bunch of serial jobs 
> spread of a large cluster, but for accounting purposes, I'd like to keep 
> track of the all.
> 
>       The basic sequence is:
> 
> Main Job [spawns]
> |
> \/
> (18)Layer_1_jobs [each_spawn]---->(20)Iterative_jobs [collect all 
> results]---->(1)Serial_job [sends 20 results back to Layer_1_Job]
> |
> \/
> (1) serial_job [final calcs]
> |
> \/
> Ends Main Job after collecting all 18 datasets.
> 
> The reason for this complicated zoo is that even if one of the iterative jobs 
> die, we can still proceed with the whole process... so robustness 
> (completeness) is the key here.
> 
> So I wonder if I shouldn't just code a controller using DRMAA instead of 
> doing Job arrays?

where are you using arrays above?

Why not submit one job, for the 20 jobs you put a hold by jobnumber/jobname of 
the initial one. For the final job you use again hold by jobnumbe/jobname for 
the 20 jobs. SGE judges a job as completed as soon as he left the system - 
whether it was successful or not doesn't matter. So all can be submitted on on 
the master node as usual.

It just necessary to have unique jobnames for each workflow, but the 20 jobs 
you could name job_a1, job_a2, ... and wait for: -hold_jid "job_a*" It could 
also be an array job with just one number which you have to wait for.


>  For development the easy hack was to make a couple of compute nodes 
> submit_hosts so that 18 jobs do to those two nodes and from there each one 
> spawns it's 20 job load to the rest of the cluster.  This was fine for 
> testing, but now there'll be multiple runs and I don't want to make my nodes 
> submit nodes.

DRMAA(1) isn't offering much in this area I fear, v2 has a JobSession to which 
you could reconnect. If for now your "workflow supervisor" crashes, the 
workflow can't be restarted.

==

Another option could be Wildfire:

http://wildfire.bii.a-star.edu.sg/screens.php

Although the project looks dead, it can still be used and will allow to create 
a workflow. It's also working without the GUI, just by supplying a file with 
the depndences, loops and cases.

-- Reuti


> I welcome your input.
> 
> Fernanda
> 
> 
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to