Well, I'm not sure I understand this...

> When these jobs have some kind of checkpointing built in, it can be set up
> in SGE to reschedule the job.

They certainly don't have checkpointing built in - these are proprietary
binaries, and I can't change their operation. I haven't finished reading
up on checkpointing yet, so I don't know if anything else is appropriate
(Kernel-level checkpointing seems like a good fix, but I've yet to find
enough detail on whether it will work for me).

> For the queue setup: one queue per group, inside the own machine's
> hostgroup should get a lower sequence number than the other group's
> machines (or a soft request for the own machines).

This is the bit I don't undestand; aren't the sequence numbers set on a
per-queue basis? If so, we're still left with one queue subordinate to the
other. That's not going to fly; we need the subordination relationship to
be one way round for one hostgroup, and the other way round for the other
hostgroup.

I'm beginning to wonder whether this is possible...

Vic.

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to