Hey everyone,

I have a small script.

#!/bin/bash

echo "start"

NUMBER=$[ ( $RANDOM % 100 )  + 1 ]

path="/production/people/lars/sge-test/output.$NUMBER.txt"

for I in {1..20}; do
  echo "----$I----" >> $path;
  sleep 1;
done
echo "end"

I submit it to sge (6.2u5)

qsub -r y -ckpt realise-checkpoint -q test.q@@atoms -o
/production/people/lars/sge-test -e /production/people/lars/sge-test
/tmp/test.sh

the checkpoint is pretty default.

On shutdown of execd
On Job Suspend
On Reschedule Job

half way through it running on a host i hit reschedule in qmon

It removes it from the running list. puts it on a different host. all fine.

I then look at the output of my random output paths.

$ cat output.22.txt
----1----
----2----
...
----19----
----20----
fluffy production ~/sge-test
$ cat output.81.txt
----1----
----2----
...
----19----
----20----

looks like sge didn't kill the task on the first host.

i do the same submission with 100 seconds. reschedule it a few seconds
into the task running and the first task will stop around 70.

is this expected behaviour?

Lars

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to