Hi,

> Am 06.12.2018 um 18:36 schrieb Dan Whitehouse <d.whiteho...@qmul.ac.uk>:
> 
> Hi,
> I've been running some MPI jobs and I expected that when the job started 
> a $TMPDIR would be created on all of the nodes, however with our (UGE) 
> configuration that does not appear to be the case.
> 
> It appears that while on the "master" node a $TMPDIR is created and 
> persists for the duration of the job, for "slave" execution hosts, the 
> directory is only created when MPI processes run and is immediately 
> reaped when they exit. Is there a way to change this behaviour such that 
> the directory persists for the entire duration of the job?

Your observations are correct. I saw a need for it some time ago: 
https://arc.liv.ac.uk/trac/SGE/ticket/1290

One can create persistent scratch directories e.g. in a job prolog (just make 
the list of nodes unique and issue `qrsh -inherit ...` for each nodes `mkdir 
$TMPDIR-persistent` Curley braces are optional here, as the dash can't be a 
character in an environment variable).

There is one pitfall: in case of a job abort one can't issue `qrsh -inherit 
...` in the epilog any longer to remove all the directories on the nodes in 
turn – the job was already canceld. My solution was to submit a "cleaner.sh" in 
the prolog too – one for each node (hence they run serial) and get the name of 
the directory they should remove as argument after the script name (this is 
known in the prolog). The job were supposed to run in a dedicated cleaner.q 
only with no limits regarding slots (hence they started as soon as they were 
eligible tun start), but got a job hold on the actual job which submitted them 
to wait until it finished.

-- Reuti


> 
> -- 
> Dan Whitehouse
> Research Systems Administrator, IT Services
> Queen Mary University of London
> Mile End
> E1 4NS
> 
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to