On 28 February 2012 11:02, Stefano Bridi <[email protected]> wrote:
> Hi list, I have a problem on a SGE setup where the home directory are
> shared trough glusterfs and some job failed to start because of a
> latency on the filesystem propagation between the login node and the
> compute node.
> What happen is that a script create a workdir with some support files,
> "cd" inside and then qsub a script, sometime the script start to run
> on the compute node too quickly and the "workdir" is not  yet visible
> on that node. I know it is a glusterfs problem that must be resolved
> elsewhere but in the meantime, where can I put a "sleep"?
> Does exist a prerun hook that I can use for that? For other use
> (copying files around and cleanup) does exist a similar postrun hook?
>
Had another thought.
Set up a load sensor for a >= complex that reports the current time
(seconds since 1970). Add a request to the qsub (via jsv if you don't
want to make the submission process more complex)
for that complex with a value greater than now+fudge factor.

William

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to