Hey Reuti

On 4 April 2012 17:14, Reuti <[email protected]> wrote:
> Well, in both cases it is killed of course. You could set loglevel to 
> log_info and search the messages file of the qmaster for entries like:
>
> 04/04/2012 17:03:07|worker|pc15370|W|job 3963.1 failed on host pc15370 
> rescheduling because: manual/auto rescheduling
> 04/04/2012 17:03:07|worker|pc15370|W|rescheduling job 3963.1
> 04/04/2012 17:03:46|worker|pc15370|I|reuti has deleted job 396

might have to rotate the file before i try and do something like that,
it's currently 117Mb.

>
> Then you can act on this. Do you have this often, that you want to reschedule 
> a job? I wonder whether using a checkpointing environment would help (also if 
> we don't intend to use any checkpointing at all). There you can have a 
> procedure for migration in migr_command.

no it's not something I want to happen often but it happens. one thing
i'm still struggling with on a related note is that a task will keep
running even after it is rescheduled. making both of the outputs
useless.

would we be able to make sure the task is kill -9'd (and it's sub
pids) if it's rescheduled using a checkpointing?

>
> -- Reuti
>
>
> Am 04.04.2012 um 16:33 schrieb Lars van der bijl:
>
>> is there a way to tell the difference?
>>
>> if i reschedual a job i get these values in the usage file in the epilog
>>
>> wait_status=3727362
>> exit_status=137
>> signal=9
>> start_time=1333549517
>> end_time=1333549565
>> ru_wallclock=48
>> ru_utime=0.226965
>> ru_stime=0.306953
>> ru_maxrss=5408
>> ru_ixrss=0
>> ru_idrss=0
>> ru_isrss=0
>> ru_minflt=40792
>> ru_majflt=5
>> ru_nswap=0
>> ru_inblock=7992
>> ru_oublock=232
>> ru_msgsnd=0
>> ru_msgrcv=0
>> ru_nsignals=0
>> ru_nvcsw=3489
>> ru_nivcsw=113
>>
>> if i kill the job I get this.
>>
>> wait_status=3727362
>> exit_status=137
>> signal=9
>> start_time=1333549704
>> end_time=1333549719
>> ru_wallclock=15
>> ru_utime=0.196970
>> ru_stime=0.196970
>> ru_maxrss=5412
>> ru_ixrss=0
>> ru_idrss=0
>> ru_isrss=0
>> ru_minflt=40459
>> ru_majflt=0
>> ru_nswap=0
>> ru_inblock=0
>> ru_oublock=232
>> ru_msgsnd=0
>> ru_msgrcv=0
>> ru_nsignals=0
>> ru_nvcsw=705
>> ru_nivcsw=149
>>
>> anyone know of a way to tell the difference from the epilog?
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to