On Tue, 7 Jan 2014, Joshua Baker-LePain wrote:
...
We're running OGS/GE 2011.11p1 on top of fully updated CentOS 6 on a
cluster with ~650 nodes.  Spool directories are local to the nodes.  Our
jobs are primarily serial, but with some parallel usage.  One user has
been having issues with random tasks of parallel array jobs failing, and
I'm having trouble tracking it down.
...

Does this sound like your problem?

  http://gridengine.org/pipermail/dev/2011-December/000081.html

There's a patch posted in that thread, although Univa later improved it. That improvement can be found in Univa's public git repo here:

  https://github.com/gridengine/gridengine

Alternatively, it was integrated into Son of Gridengine some time ago.

All the best,

Mark
--
-----------------------------------------------------------------
Mark Dixon                       Email    : [email protected]
HPC/Grid Systems Support         Tel (int): 35429
Information Systems Services     Tel (ext): +44(0)113 343 5429
University of Leeds, LS2 9JT, UK
-----------------------------------------------------------------
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to