Well, in both cases it is killed of course. You could set loglevel to log_info and search the messages file of the qmaster for entries like:
04/04/2012 17:03:07|worker|pc15370|W|job 3963.1 failed on host pc15370 rescheduling because: manual/auto rescheduling 04/04/2012 17:03:07|worker|pc15370|W|rescheduling job 3963.1 04/04/2012 17:03:46|worker|pc15370|I|reuti has deleted job 396 Then you can act on this. Do you have this often, that you want to reschedule a job? I wonder whether using a checkpointing environment would help (also if we don't intend to use any checkpointing at all). There you can have a procedure for migration in migr_command. -- Reuti Am 04.04.2012 um 16:33 schrieb Lars van der bijl: > is there a way to tell the difference? > > if i reschedual a job i get these values in the usage file in the epilog > > wait_status=3727362 > exit_status=137 > signal=9 > start_time=1333549517 > end_time=1333549565 > ru_wallclock=48 > ru_utime=0.226965 > ru_stime=0.306953 > ru_maxrss=5408 > ru_ixrss=0 > ru_idrss=0 > ru_isrss=0 > ru_minflt=40792 > ru_majflt=5 > ru_nswap=0 > ru_inblock=7992 > ru_oublock=232 > ru_msgsnd=0 > ru_msgrcv=0 > ru_nsignals=0 > ru_nvcsw=3489 > ru_nivcsw=113 > > if i kill the job I get this. > > wait_status=3727362 > exit_status=137 > signal=9 > start_time=1333549704 > end_time=1333549719 > ru_wallclock=15 > ru_utime=0.196970 > ru_stime=0.196970 > ru_maxrss=5412 > ru_ixrss=0 > ru_idrss=0 > ru_isrss=0 > ru_minflt=40459 > ru_majflt=0 > ru_nswap=0 > ru_inblock=0 > ru_oublock=232 > ru_msgsnd=0 > ru_msgrcv=0 > ru_nsignals=0 > ru_nvcsw=705 > ru_nivcsw=149 > > anyone know of a way to tell the difference from the epilog? > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
