On Tue, 7 Jan 2014, Joshua Baker-LePain wrote:
...
We're running OGS/GE 2011.11p1 on top of fully updated CentOS 6 on a
cluster with ~650 nodes. Spool directories are local to the nodes. Our
jobs are primarily serial, but with some parallel usage. One user has
been having issues with random tasks of parallel array jobs failing, and
I'm having trouble tracking it down.
...
Does this sound like your problem?
http://gridengine.org/pipermail/dev/2011-December/000081.html
There's a patch posted in that thread, although Univa later improved it.
That improvement can be found in Univa's public git repo here:
https://github.com/gridengine/gridengine
Alternatively, it was integrated into Son of Gridengine some time ago.
All the best,
Mark
--
-----------------------------------------------------------------
Mark Dixon Email : [email protected]
HPC/Grid Systems Support Tel (int): 35429
Information Systems Services Tel (ext): +44(0)113 343 5429
University of Leeds, LS2 9JT, UK
-----------------------------------------------------------------
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users