Hello, We're using SGE 8.1.9 and randomly, we have jobs that finish with success (our jobs logs confirm this) but the master is not notified. On the compute, all the folders related to such a job are still here, correctly filled:
trace file: ... 04/04/2018 21:50:13 [300:38328]: now running with uid=300, euid=300 04/04/2018 21:50:13 [300:38328]: execvlp(/bin/ksh, "-ksh" "/gridware/sge/gridname/spool/server/job_scripts/1376090") 04/04/2018 21:50:23 [300:38327]: wait3 returned 38328 (status: 0; WIFSIGNALED: 0, WIFEXITED: 1, WEXITSTATUS: 0) 04/04/2018 21:50:23 [300:38327]: job exited with exit status 0 04/04/2018 21:50:23 [300:38327]: reaped "job" with pid 38328 04/04/2018 21:50:23 [300:38327]: job exited not due to signal 04/04/2018 21:50:23 [300:38327]: job exited with status 0 04/04/2018 21:50:23 [300:38327]: now sending signal KILL to pid -38328 04/04/2018 21:50:23 [300:38327]: pdc_kill_addgrpid: 20075 9 04/04/2018 21:50:23 [300:38327]: writing usage file to "usage" 04/04/2018 21:50:23 [300:38327]: no epilog script to start exit_status: 0 error: (empty) but the process no longer appears in the 'ps' output. On the master, doing a 'qstat -j 1376090' works and so, to get rid of such a job, we are performing 'qdel -f 1376090'. This happens 3 or 4 times a day (we submit more than 100k jobs per day), on different exec hosts. Do you know what could be the cause of this behavior? Thanks, Paul. _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users