On 23 March 2012 13:03, Reuti <[email protected]> wrote: > Am 23.03.2012 um 11:55 schrieb Lars van der bijl: > >> On 23 March 2012 11:46, Reuti <[email protected]> wrote: >>> Hi, >>> >>> Am 23.03.2012 um 10:46 schrieb Lars van der bijl: >>> >>>> Hey everyone, >>>> >>>> I have a small script. >>>> >>>> #!/bin/bash >>>> >>>> echo "start" >>>> >>>> NUMBER=$[ ( $RANDOM % 100 ) + 1 ] >>>> >>>> path="/production/people/lars/sge-test/output.$NUMBER.txt" >>>> >>>> for I in {1..20}; do >>>> echo "----$I----" >> $path; >>>> sleep 1; >>>> done >>>> echo "end" >>>> >>>> I submit it to sge (6.2u5) >>>> >>>> qsub -r y -ckpt realise-checkpoint -q test.q@@atoms -o >>>> /production/people/lars/sge-test -e /production/people/lars/sge-test >>>> /tmp/test.sh >>>> >>>> the checkpoint is pretty default. >>>> >>>> On shutdown of execd >>>> On Job Suspend >>>> On Reschedule Job >>> >>> There is no setting "On Reschedule Job", what is you ckpt definition in >>> detail qconf -sckpt ... >> >> ckpt_name realise-checkpoint >> interface USERDEFINED >> ckpt_command NONE >> migr_command NONE >> restart_command NONE >> clean_command NONE >> ckpt_dir /tmp >> signal NONE >> when xsr >> >> It's called Reschedule Job (- the On) > > This is the action, but the condition is the unknown state of the exechost. > > >>> >>>> half way through it running on a host i hit reschedule in qmon >>>> >>>> It removes it from the running list. puts it on a different host. all fine. >>> >>> Where does the checkpointing environment come into play here when you >>> rescheduled it by hand already? I.e. you are doing like "qmod -rj ..." if I >>> get you right, just in the GUI. >> >> yes just in the GUI. >> >> regardless of a checkpoint. shouldn't sge kill the task immediately on >> the old host? > > After a delay: yes.
is it possible to changed the time of the delay? > > There was issue 1521 which was fixed in 6.2u3, so it shouln't be there any > longer. > > http://permalink.gmane.org/gmane.comp.clustering.gridengine.users/20388 > > IMO the use of a checkpointing environment doesn't change anything here. > > -- euti > > >>>> >>>> I then look at the output of my random output paths. >>>> >>>> $ cat output.22.txt >>>> ----1---- >>>> ----2---- >>>> ... >>>> ----19---- >>>> ----20---- >>>> fluffy production ~/sge-test >>>> $ cat output.81.txt >>>> ----1---- >>>> ----2---- >>>> ... >>>> ----19---- >>>> ----20---- >>>> >>>> looks like sge didn't kill the task on the first host. >>>> >>>> i do the same submission with 100 seconds. reschedule it a few seconds >>>> into the task running and the first task will stop around 70. >>> >>> Do you specify anything like -notify? >> >> no I do not. >> >>> >>> -- Reuti >>> >>> >>>> is this expected behaviour? >>>> >>>> Lars >>>> >>>> _______________________________________________ >>>> users mailing list >>>> [email protected] >>>> https://gridengine.org/mailman/listinfo/users >>> >> > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
