It's a limit being reached, of some sort. Do you have a RQS of any kind (qconf -srqs)? We see this for job-requested, or system set RAM exhaustion (OOM killer, as mentioned 'dmesg -T' on compute nodes often useful), as well as time limits reached. What is the whole output from 'qacct -j JOBID'?
Cheers, -Hugh -----Original Message----- From: users-boun...@gridengine.org <users-boun...@gridengine.org> On Behalf Of hiller Sent: Tuesday, May 14, 2019 9:02 AM To: users@gridengine.org Subject: Re: [gridengine users] jobs randomly die Hi, nope, there are no oom messages in the journal. Regards, ulrich On 5/14/19 12:49 PM, Arnau wrote: > Hi, > > _maybe_ the OOM killer killed the job ? a look to messages will give you an > answer (I've seen this in my cluster). > > HTH, > Arnau > > El mar., 14 may. 2019 a las 12:37, hiller (<hil...@mpia-hd.mpg.de > <mailto:hil...@mpia-hd.mpg.de>>) escribió: > > Dear all, > i have a problem that jobs sent to gridengine randomly die. > The gridengine version is 8.1.9 > The OS is opensuse 15.0 > The gridengine messages file says: > 05/13/2019 18:31:45|worker|karun|E|master task of job 635659.1 failed - > killing job > 05/13/2019 18:31:46|worker|karun|W|job 635659.1 failed on host karun10 > assumedly after job because: job 635659.1 died through signal KILL (9) > > qacct -j 635659 says: > failed 100 : assumedly after job > exit_status 137 (Killed) > > > The was no kill triggered by the user. Also there are no other > limitations, neither ulimit nor in the gridengine queue > The 'qconf -sq all.q' command gives: > s_rt INFINITY > h_rt INFINITY > s_cpu INFINITY > h_cpu INFINITY > s_fsize INFINITY > h_fsize INFINITY > s_data INFINITY > h_data INFINITY > s_stack INFINITY > h_stack INFINITY > s_core INFINITY > h_core INFINITY > s_rss INFINITY > h_rss INFINITY > s_vmem INFINITY > h_vmem INFINITY > > Years ago there were some threads about the same issue, but i did not > find a solution. > > Does somebody have a hint what i can do or check/debug? > > With kind regards and many thanks for any help, ulrich > _______________________________________________ > users mailing list > users@gridengine.org <mailto:users@gridengine.org> > https://gridengine.org/mailman/listinfo/users > _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users