Hey Reuti On 4 April 2012 17:14, Reuti <[email protected]> wrote: > Well, in both cases it is killed of course. You could set loglevel to > log_info and search the messages file of the qmaster for entries like: > > 04/04/2012 17:03:07|worker|pc15370|W|job 3963.1 failed on host pc15370 > rescheduling because: manual/auto rescheduling > 04/04/2012 17:03:07|worker|pc15370|W|rescheduling job 3963.1 > 04/04/2012 17:03:46|worker|pc15370|I|reuti has deleted job 396
might have to rotate the file before i try and do something like that, it's currently 117Mb. > > Then you can act on this. Do you have this often, that you want to reschedule > a job? I wonder whether using a checkpointing environment would help (also if > we don't intend to use any checkpointing at all). There you can have a > procedure for migration in migr_command. no it's not something I want to happen often but it happens. one thing i'm still struggling with on a related note is that a task will keep running even after it is rescheduled. making both of the outputs useless. would we be able to make sure the task is kill -9'd (and it's sub pids) if it's rescheduled using a checkpointing? > > -- Reuti > > > Am 04.04.2012 um 16:33 schrieb Lars van der bijl: > >> is there a way to tell the difference? >> >> if i reschedual a job i get these values in the usage file in the epilog >> >> wait_status=3727362 >> exit_status=137 >> signal=9 >> start_time=1333549517 >> end_time=1333549565 >> ru_wallclock=48 >> ru_utime=0.226965 >> ru_stime=0.306953 >> ru_maxrss=5408 >> ru_ixrss=0 >> ru_idrss=0 >> ru_isrss=0 >> ru_minflt=40792 >> ru_majflt=5 >> ru_nswap=0 >> ru_inblock=7992 >> ru_oublock=232 >> ru_msgsnd=0 >> ru_msgrcv=0 >> ru_nsignals=0 >> ru_nvcsw=3489 >> ru_nivcsw=113 >> >> if i kill the job I get this. >> >> wait_status=3727362 >> exit_status=137 >> signal=9 >> start_time=1333549704 >> end_time=1333549719 >> ru_wallclock=15 >> ru_utime=0.196970 >> ru_stime=0.196970 >> ru_maxrss=5412 >> ru_ixrss=0 >> ru_idrss=0 >> ru_isrss=0 >> ru_minflt=40459 >> ru_majflt=0 >> ru_nswap=0 >> ru_inblock=0 >> ru_oublock=232 >> ru_msgsnd=0 >> ru_msgrcv=0 >> ru_nsignals=0 >> ru_nvcsw=705 >> ru_nivcsw=149 >> >> anyone know of a way to tell the difference from the epilog? >> _______________________________________________ >> users mailing list >> [email protected] >> https://gridengine.org/mailman/listinfo/users > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
