Am 06.04.2011 um 12:34 schrieb Vic: > Well, I'm not sure I understand this... > >> When these jobs have some kind of checkpointing built in, it can be set up >> in SGE to reschedule the job. > > They certainly don't have checkpointing built in - these are proprietary > binaries, and I can't change their operation. I haven't finished reading > up on checkpointing yet, so I don't know if anything else is appropriate > (Kernel-level checkpointing seems like a good fix, but I've yet to find > enough detail on whether it will work for me).
Okay. >> For the queue setup: one queue per group, inside the own machine's >> hostgroup should get a lower sequence number than the other group's >> machines (or a soft request for the own machines). > > This is the bit I don't undestand; aren't the sequence numbers set on a > per-queue basis? No. On a queue instance level. > If so, we're still left with one queue subordinate to the > other. That's not going to fly; we need the subordination relationship to > be one way round for one hostgroup, and the other way round for the other > hostgroup. Yep. No problem. $ qconf -sq group1.q ... seq_no 0,[@group1.hgrp=10],[@group2.hgrp=20] ... user_lists group1_users ... subordinate none,[@group1.hgrp=group2.q=1] $ qconf -sq group2.q ... seq_no 0,[@group2.hgrp=10],[@group1.hgrp=20] ... user_lists group2_users ... subordinate none,[@group2.hgrp=group1.q=1] Idea is: a) schedule to the group's machines first b) access only for the resp. group c) on my own machines i suspend foreign queue instances -- Reuti > I'm beginning to wonder whether this is possible... > > Vic. > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
