Greetings.

We have a queue defined with a soft & hard wall-clock limit of:

qconf -sq free64 | egrep "_rt|notify"
notify                00:05:00
s_rt                  48:00:00
h_rt                  48:05:00

And jobs get killed correctly after 2 days of wall-clock run time. We now have 
Grid
Engine checkpoint setup and would like to make it so that jobs do not get 
killed,
but rather be sent the suspend signal so that checkpoint takes over instead of
being killed.

After reading and doing some tests with the queue  "suspend_method", I am not
sure I am on the right track.

So what is the proper / correct way to do this?    To *not* have jobs killed but
to have the checkpoint process take over when s_rt is reached?

Joseph

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to