Am 06.04.2011 um 12:34 schrieb Vic:

> Well, I'm not sure I understand this...
> 
>> When these jobs have some kind of checkpointing built in, it can be set up
>> in SGE to reschedule the job.
> 
> They certainly don't have checkpointing built in - these are proprietary
> binaries, and I can't change their operation. I haven't finished reading
> up on checkpointing yet, so I don't know if anything else is appropriate
> (Kernel-level checkpointing seems like a good fix, but I've yet to find
> enough detail on whether it will work for me).

Okay.


>> For the queue setup: one queue per group, inside the own machine's
>> hostgroup should get a lower sequence number than the other group's
>> machines (or a soft request for the own machines).
> 
> This is the bit I don't undestand; aren't the sequence numbers set on a
> per-queue basis?

No. On a queue instance level.


> If so, we're still left with one queue subordinate to the
> other. That's not going to fly; we need the subordination relationship to
> be one way round for one hostgroup, and the other way round for the other
> hostgroup.

Yep. No problem.

$ qconf -sq group1.q
...
seq_no 0,[@group1.hgrp=10],[@group2.hgrp=20]
...
user_lists group1_users
...
subordinate none,[@group1.hgrp=group2.q=1]


$ qconf -sq group2.q
...
seq_no 0,[@group2.hgrp=10],[@group1.hgrp=20]
...
user_lists group2_users
...
subordinate none,[@group2.hgrp=group1.q=1]


Idea is:

a) schedule to the group's machines first
b) access only for the resp. group
c) on my own machines i suspend foreign queue instances

-- Reuti


> I'm beginning to wonder whether this is possible...
> 
> Vic.
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to