Re: [gridengine users] Job runs on nodes that are not part of queue!

Jesse Becker Mon, 23 Jan 2012 13:09:14 -0800

On Mon, Jan 23, 2012 at 03:55:57PM -0500, Andrew Pearson wrote:

Thanks Reuti


OK - I made duplicates of all of my parallel environments, so that the slow 
queue has a different PE list than the fast queue.  The submitted job now runs 
on the correct queue.

However, in some sense I'm back to square one.  The reason I created two queues 
and made them non-requestable is that I wanted to assign resources to users, 
rather than have them choose them.  Now, the user can effectively choose which 
queue to be in by choosing the correct parallel environment.  I can't see a way 
to make the parallel environments non-requestable.  Even if this were possible 
however, if the user doesn't include a -pe line in their submission script, I 
don't see how they would specify the number of processors they need.

Sorry for my basic questions.  I'd appreciate any comments you have.



Unsure if this helps, but you can request PEs using wildcards.  So if
you have two PEs named "PE-one" and "PE-two", I think you could submit
a job like this:

        qsub -p 'PE-*' wrapper.sh

It will go to one or the other.

Two other thoughts:

1) write some sort of a qsub wrapper
2) edit the global sge_request file to include a PE request option.




On Mon, Jan 23, 2012 at 2:57 PM, Reuti 
<[email protected]<mailto:[email protected]>> wrote:
Am 23.01.2012 um 20:34 schrieb Andrew Pearson:

Hi.  I'm trying to move from load-based to sequence based scheduling, and I 
have a problem.  First, a little something about my setup:

I have two sets of machines - 176 'fast' cores in 16-core nodes, and 90 'slow' 
cores in 2-core nodes.  I have two corresponding queues - slow.q and fast.q.  
The queues are non-requestable.  fast.q looks at the @fast host group, which 
contains only the names of the fast nodes, and slow.q looks at the @slow host 
group, which contains only the names of the slow nodes.  In fast.q, I have 
slots = 16 and processors = 16, while in slow.q I have slots = 2 and processors 
= 2.  Finally, slow.q is seq_no 1 and fast.q is seq_no 2.

Here's the problem:  If I submit a 120 processor job (so it's too large to fit 
on the slow cores), it still gets assigned to slow.q.  This in itself is bad - 
I want such a job to go directly to fast.q.  Its gets worse though - because 
there aren't enough machines in slow.q, the remaining 30 threads end up on 
nodes in fast.q!  I don't understand how this second part is possible.  I've 
done qstat -f, and my 'fast' compute nodes definitely aren't listed as being 
members of slow.q.

Any suggestions?  Thank you.


If the same PE is attached to more than one queue, it can collect slots from 
any of them:

http://gridengine.org/pipermail/users/2012-January/002526.html

-- Reuti

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users



--
Jesse Becker
NHGRI Linux support (Digicon Contractor)
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Job runs on nodes that are not part of queue!

Reply via email to