Hey everyone,
I have a small script.
#!/bin/bash
echo "start"
NUMBER=$[ ( $RANDOM % 100 ) + 1 ]
path="/production/people/lars/sge-test/output.$NUMBER.txt"
for I in {1..20}; do
echo "----$I----" >> $path;
sleep 1;
done
echo "end"
I submit it to sge (6.2u5)
qsub -r y -ckpt realise-checkpoint -q test.q@@atoms -o
/production/people/lars/sge-test -e /production/people/lars/sge-test
/tmp/test.sh
the checkpoint is pretty default.
On shutdown of execd
On Job Suspend
On Reschedule Job
half way through it running on a host i hit reschedule in qmon
It removes it from the running list. puts it on a different host. all fine.
I then look at the output of my random output paths.
$ cat output.22.txt
----1----
----2----
...
----19----
----20----
fluffy production ~/sge-test
$ cat output.81.txt
----1----
----2----
...
----19----
----20----
looks like sge didn't kill the task on the first host.
i do the same submission with 100 seconds. reschedule it a few seconds
into the task running and the first task will stop around 70.
is this expected behaviour?
Lars
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users