Hi all,

        I have this complicated job sequence that requires the "spawning" of 
many other jobs related to the original one.  It's a bunch of serial jobs 
spread of a large cluster, but for accounting purposes, I'd like to keep track 
of the all.

        The basic sequence is:

Main Job [spawns]
 |
\/
(18)Layer_1_jobs [each_spawn]---->(20)Iterative_jobs [collect all 
results]---->(1)Serial_job [sends 20 results back to Layer_1_Job]
 |
\/
(1) serial_job [final calcs]
 |
\/
Ends Main Job after collecting all 18 datasets.

The reason for this complicated zoo is that even if one of the iterative jobs 
die, we can still proceed with the whole process... so robustness 
(completeness) is the key here.

So I wonder if I shouldn't just code a controller using DRMAA instead of doing 
Job arrays?  For development the easy hack was to make a couple of compute 
nodes submit_hosts so that 18 jobs do to those two nodes and from there each 
one spawns it's 20 job load to the rest of the cluster.  This was fine for 
testing, but now there'll be multiple runs and I don't want to make my nodes 
submit nodes.

I welcome your input.

Fernanda



_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to