Hi. I'm trying to move from load-based to sequence based scheduling, and I have a problem. First, a little something about my setup:
I have two sets of machines - 176 'fast' cores in 16-core nodes, and 90 'slow' cores in 2-core nodes. I have two corresponding queues - slow.q and fast.q. The queues are non-requestable. fast.q looks at the @fast host group, which contains only the names of the fast nodes, and slow.q looks at the @slow host group, which contains only the names of the slow nodes. In fast.q, I have slots = 16 and processors = 16, while in slow.q I have slots = 2 and processors = 2. Finally, slow.q is seq_no 1 and fast.q is seq_no 2. Here's the problem: If I submit a 120 processor job (so it's too large to fit on the slow cores), it still gets assigned to slow.q. This in itself is bad - I want such a job to go directly to fast.q. Its gets worse though - because there aren't enough machines in slow.q, the remaining 30 threads end up on nodes in fast.q! I don't understand how this second part is possible. I've done qstat -f, and my 'fast' compute nodes definitely aren't listed as being members of slow.q. Any suggestions? Thank you.
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
