Hi, Am 31.03.2014 um 18:22 schrieb Eric Kaufmann:
> We are using ge 6.2u5 with CentOS 6.4. > > I have jobs that are randomly being killed. Here is the log entry. The jobs > that are getting killed are getting an exit status of 127 or 137. I did check > /var/log/messages on the nodes and didn't see anything out of the ordinary. > > 03/31/2014 09:55:30|worker|kepler|W|job 33393.1 failed on host > research029.cm.cluster assumedly after job because: job 33393.1 died through > signal KILL (9) > > 03/31/2014 09:55:34|worker|kepler|W|job 33394.1 failed on host > research026.cm.cluster assumedly after job because: job 33394.1 died through > signal KILL (9) Did you request any limit during job submission? The lines above are in the messages file of the qmaster - is there anything in the messages file of SGE on the nodes (you checked the system one on the nodes)? -- Reuti > qacct -j 33394 > > qname std > hostname research026.cm.cluster > group justinchem > owner justinchem > project NONE > department defaultdepartment > jobname runCHO-C6H5-Cs_opt.24081 > jobnumber 33394 > taskid undefined > account sge > priority 0 > qsub_time Mon Mar 31 09:54:53 2014 > start_time Mon Mar 31 09:55:10 2014 > end_time Mon Mar 31 09:55:33 2014 > granted_pe gauss > slots 4 > failed 100 : assumedly after job > exit_status 137 > ru_wallclock 23 > ru_utime 0.003 > ru_stime 0.008 > ru_maxrss 1380 > ru_ixrss 0 > ru_ismrss 0 > ru_idrss 0 > ru_isrss 0 > ru_minflt 1957 > ru_majflt 5 > ru_nswap 0 > ru_inblock 584 > ru_oublock 40 > ru_msgsnd 0 > ru_msgrcv 0 > ru_nsignals 0 > ru_nvcsw 58 > ru_nivcsw 6 > cpu 82.570 > mem 452.669 > io 0.084 > iow 0.000 > maxvmem 5.710G > arid undefined > > Thanks, > > Eric > > -- > Eric Kaufmann | Application Support Analyst - Advanced Technology Group | > Saint Louis University | 314-977-2257 | [email protected] > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
