On 23 March 2012 11:46, Reuti <[email protected]> wrote:
> Hi,
>
> Am 23.03.2012 um 10:46 schrieb Lars van der bijl:
>
>> Hey everyone,
>>
>> I have a small script.
>>
>> #!/bin/bash
>>
>> echo "start"
>>
>> NUMBER=$[ ( $RANDOM % 100 )  + 1 ]
>>
>> path="/production/people/lars/sge-test/output.$NUMBER.txt"
>>
>> for I in {1..20}; do
>>  echo "----$I----" >> $path;
>>  sleep 1;
>> done
>> echo "end"
>>
>> I submit it to sge (6.2u5)
>>
>> qsub -r y -ckpt realise-checkpoint -q test.q@@atoms -o
>> /production/people/lars/sge-test -e /production/people/lars/sge-test
>> /tmp/test.sh
>>
>> the checkpoint is pretty default.
>>
>> On shutdown of execd
>> On Job Suspend
>> On Reschedule Job
>
> There is no setting "On Reschedule Job", what is you ckpt definition in 
> detail qconf -sckpt ...

ckpt_name          realise-checkpoint
interface          USERDEFINED
ckpt_command       NONE
migr_command       NONE
restart_command    NONE
clean_command      NONE
ckpt_dir           /tmp
signal             NONE
when               xsr

It's called Reschedule Job (- the On)

>
>> half way through it running on a host i hit reschedule in qmon
>>
>> It removes it from the running list. puts it on a different host. all fine.
>
> Where does the checkpointing environment come into play here when you 
> rescheduled it by hand already? I.e. you are doing like "qmod -rj ..." if I 
> get you right, just in the GUI.

yes just in the GUI.

regardless of a checkpoint. shouldn't sge kill the task immediately on
the old host?

>
>
>> I then look at the output of my random output paths.
>>
>> $ cat output.22.txt
>> ----1----
>> ----2----
>> ...
>> ----19----
>> ----20----
>> fluffy  production ~/sge-test
>> $ cat output.81.txt
>> ----1----
>> ----2----
>> ...
>> ----19----
>> ----20----
>>
>> looks like sge didn't kill the task on the first host.
>>
>> i do the same submission with 100 seconds. reschedule it a few seconds
>> into the task running and the first task will stop around 70.
>
> Do you specify anything like -notify?

no I do not.

>
> -- Reuti
>
>
>> is this expected behaviour?
>>
>> Lars
>>
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to