Hi Rayson & Ron.
Thank you both for responding.
We do a lot of parallel runs with our cluster. Here is more info on what we
currently have and I will keep this example down to 3 queues and 6 nodes for
simplicity.
With our current Torque setup, I have 6 64-core nodes. 3 nodes belong to the
math group, 3 nodes to the bio group. We setup our Queues as: 1 Queue being
Preemptee, 2 being Preemptors.
When I create an account, the account is setup to belong to the 'math' group,
or to the 'bio' group.
Our current nodes and Queues are as follows:
3 nodes have the properties "math", "free" and "64" cores.
3 nodes have the properties "bio", "free" and "64" cores.
The "math" Queue looks for nodes with "math" properties and run jobs only on
"math" nodes. Math Q is Preemptor.
The "bio" Queue looks for nodes with "bio" properties and runs jobs only on the
"bio" nodes. Bio Q is Preemptor.
The "free" Queue looks for nodes with "free" properties and runs jobs on
any node BUT only as a Preemptee job.
The idea here is that the free Q allows everyone to use the "free" nodes as long as the owners (math or bio) are not using them. The free Q is setup as a Preemptee Q, the math & bio Q's are setup as
Preemptor Q's.
When the math users submit a job to the math Q, any free job running on the
math nodes get suspended.
When the bio users submit a job to the bio Q, any free job on the bio nodes
also get suspended.
Suspended jobs automatically resume when the node owners are done using their
nodes (no jobs on node).
With Torque, math users can request from 1 to 3 math nodes and from 1-64 cores
on each node. For example, a math user can request 2 math nodes at 32 cores
each in interactive mode with:
qsub -I -q math nodes=2:ppn=32
If the user does not belong to the 'math' group, they are prevented from
running on the math Q. Same for the bio users.
I will stop here as I have more requirements, but this is the main set of
functions I am looking for in OGE.
Thank you again for your generous efforts in helping.
Joseph
On 4/20/2012 9:01 PM, Rayson Ho wrote:
Hi Joseph,
"Queues" in Grid Engine (and Open Grid Scheduler/Grid Engine) and the
ones Torque/Maui have slightly different meaning.
In Grid Engine, jobs are not submitted to "queues", but rather jobs
are submitted to the global waiting area. Then the scheduler picks
"queue instances" (queue instances roughly = hosts, yet each host can
have more than 1 queue instance) that satisfy the resource
requirements of each job, and at that point they are binded to the
queues.
We also have global queues called "cluster queues", but they are
abstraction of the queue instances.
So what does that all mean??
In LSF or Torque, some clusters have debug queues, short queues, long
queues, etc. Those can be migrated to Grid Engine cluster queues with
some work (ie. relatively easy).
If you want queue level user-based fairshare or queue-based fairshare
in LSF (eg. users in each queue gets a different priority) - I have
not looked at Maui for a while, not sure if it has this feature, then
it can be harder to implement or model in Grid Engine.
If you let us know a bit more about your setup, then we can provide
further help.
Rayson
On Fri, Apr 20, 2012 at 11:42 PM, Joseph A. Farran<[email protected]> wrote:
Hi All.
I am a long time Torque/Maui admin running an HPC cluster looking to
Transition to Open Grid Engine. I am a newbie with OGE however.
Are there any links and or helpful tips on moving to OGE from an admin point
of view? How to convert Torque qmgr queues, nodes, resource limits to the
equivalent in OGE?
Thanks,
Joseph
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users