Well, in both cases it is killed of course. You could set loglevel to log_info 
and search the messages file of the qmaster for entries like:

04/04/2012 17:03:07|worker|pc15370|W|job 3963.1 failed on host pc15370 
rescheduling because: manual/auto rescheduling
04/04/2012 17:03:07|worker|pc15370|W|rescheduling job 3963.1
04/04/2012 17:03:46|worker|pc15370|I|reuti has deleted job 396

Then you can act on this. Do you have this often, that you want to reschedule a 
job? I wonder whether using a checkpointing environment would help (also if we 
don't intend to use any checkpointing at all). There you can have a procedure 
for migration in migr_command.

-- Reuti


Am 04.04.2012 um 16:33 schrieb Lars van der bijl:

> is there a way to tell the difference?
> 
> if i reschedual a job i get these values in the usage file in the epilog
> 
> wait_status=3727362
> exit_status=137
> signal=9
> start_time=1333549517
> end_time=1333549565
> ru_wallclock=48
> ru_utime=0.226965
> ru_stime=0.306953
> ru_maxrss=5408
> ru_ixrss=0
> ru_idrss=0
> ru_isrss=0
> ru_minflt=40792
> ru_majflt=5
> ru_nswap=0
> ru_inblock=7992
> ru_oublock=232
> ru_msgsnd=0
> ru_msgrcv=0
> ru_nsignals=0
> ru_nvcsw=3489
> ru_nivcsw=113
> 
> if i kill the job I get this.
> 
> wait_status=3727362
> exit_status=137
> signal=9
> start_time=1333549704
> end_time=1333549719
> ru_wallclock=15
> ru_utime=0.196970
> ru_stime=0.196970
> ru_maxrss=5412
> ru_ixrss=0
> ru_idrss=0
> ru_isrss=0
> ru_minflt=40459
> ru_majflt=0
> ru_nswap=0
> ru_inblock=0
> ru_oublock=232
> ru_msgsnd=0
> ru_msgrcv=0
> ru_nsignals=0
> ru_nvcsw=705
> ru_nivcsw=149
> 
> anyone know of a way to tell the difference from the epilog?
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to