"HEREDIA GENESTAR, JOSE MARIA" <[email protected]> writes:
> I have searched the archives and found a similar issue from 2010 ( > http://arc.liv.ac.uk/pipermail/gridengine-users/2010-March/029785.html and > other reports). It seems to be a qmaster bug that should be fixed ( > http://markmail.org/thread/njkqj4byiqvye67i#query:+page:1+mid:njkqj4byiqvye67i+state:results). If it's that bug <https://arc.liv.ac.uk/trac/SGE/ticket/789> several responses from me should have shown up... > Nobody in those threads mentions a previous oom-killer crash before the > repeating segfaults, but some report that this happens when > submitting/running/finishing thightly-integrated parallel jobs. We usually > don't run that kind of jobs in our cluster, but, preciselly today, one of > our users is using them. Even though, it seems that some of those jobs have > finished correctly before our first crash, so I don't know if these are > related issues or it if is just a mere coincidence. > > How can we fix this? We are using SGE 6.2u5, the default binaries from > rocks-cluster 6-0. Should I install a brand new version? If you install http://arc.liv.ac.uk/downloads/SGE/releases/8.1.2/ you'll get a fix for that and hundreds of other improvements. > Is there any > sge_qmaster binary that fixes this and is compatible with 6.2u5? I don't see much point in just replacing the qmaster and not upgrading, but http://arc.liv.ac.uk/downloads/SGE/packages/RH5/ was what I was running at one time, particularly to fix that. -- Community Grid Engine: http://arc.liv.ac.uk/SGE/ _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
