Am 21.04.2012 um 20:53 schrieb Joseph A. Farran: > Hi Rayson & Ron. > > Thank you both for responding. > > We do a lot of parallel runs with our cluster. Here is more info on what we > currently have and I will keep this example down to 3 queues and 6 nodes for > simplicity. > > With our current Torque setup, I have 6 64-core nodes. 3 nodes belong to > the math group, 3 nodes to the bio group. We setup our Queues as: 1 Queue > being Preemptee, 2 being Preemptors. > > When I create an account, the account is setup to belong to the 'math' group, > or to the 'bio' group. > > Our current nodes and Queues are as follows: > > 3 nodes have the properties "math", "free" and "64" cores. > 3 nodes have the properties "bio", "free" and "64" cores. > > The "math" Queue looks for nodes with "math" properties and run jobs only > on "math" nodes. Math Q is Preemptor. > The "bio" Queue looks for nodes with "bio" properties and runs jobs only > on the "bio" nodes. Bio Q is Preemptor. > The "free" Queue looks for nodes with "free" properties and runs jobs on > any node BUT only as a Preemptee job.
By default you specify resource requests and SGE will select an appropriate queue for your job as Rayson layed out. For your setup I suggest: - define one ACL for "math" with their members - define one ACL for "bio" with their members - define one hostgroup for "@math" machines - define one hostgroup for "@bio" machines - then you can limit the access to certain nodes for a group either: --> on a queue-instance level --> with an RQS (--> on a host level, but not in your setup due to the preempt queue, just to be complete) Let's go with the queue-instance: $ qconf -sq normal.q hostlist @math,@bio ... user_lists NONE,[@math=math],[@bio=bio] where @math = hostgroup math math = ACL with math users For the second queue you can specify a preemption either on a slotwise level or as soon as one slot is used by the owning group the node in question: $ qconf -sq free.q ... user_lists NONE,[@math=bio],[@bio=math] (assuming noone wants to submit to his own machine in the preempt queue, otherwise leave it out) ... subordinate_list normal.q=1 Although you could submit jobs to either queue by specifying either "-q nromal.q" resp. "free.q", I suggest to create a boolean complex with FORCED attribute and attach it only to the free.q $ qconf -sq free.q ... complex_values free=TRUE The advantage is, that for normal jobs you can submit with a plain `qsub job.sh`, and jobs won't get to free.q. For the jobs running on voluntary nodes then this complex needs to be requested: `qsub -l free job.sh`. -- Reuti NB: Suspended jobs will still use memory or other requested resources. > The idea here is that the free Q allows everyone to use the "free" nodes as > long as the owners (math or bio) are not using them. The free Q is setup as > a Preemptee Q, the math & bio Q's are setup as Preemptor Q's. > > When the math users submit a job to the math Q, any free job running on the > math nodes get suspended. > > When the bio users submit a job to the bio Q, any free job on the bio nodes > also get suspended. > > Suspended jobs automatically resume when the node owners are done using their > nodes (no jobs on node). > > With Torque, math users can request from 1 to 3 math nodes and from 1-64 > cores on each node. For example, a math user can request 2 math nodes at 32 > cores each in interactive mode with: > > qsub -I -q math nodes=2:ppn=32 > > If the user does not belong to the 'math' group, they are prevented from > running on the math Q. Same for the bio users. > > I will stop here as I have more requirements, but this is the main set of > functions I am looking for in OGE. > > Thank you again for your generous efforts in helping. > > Joseph > > > On 4/20/2012 9:01 PM, Rayson Ho wrote: >> Hi Joseph, >> >> "Queues" in Grid Engine (and Open Grid Scheduler/Grid Engine) and the >> ones Torque/Maui have slightly different meaning. >> >> In Grid Engine, jobs are not submitted to "queues", but rather jobs >> are submitted to the global waiting area. Then the scheduler picks >> "queue instances" (queue instances roughly = hosts, yet each host can >> have more than 1 queue instance) that satisfy the resource >> requirements of each job, and at that point they are binded to the >> queues. >> >> We also have global queues called "cluster queues", but they are >> abstraction of the queue instances. >> >> So what does that all mean?? >> >> In LSF or Torque, some clusters have debug queues, short queues, long >> queues, etc. Those can be migrated to Grid Engine cluster queues with >> some work (ie. relatively easy). >> >> If you want queue level user-based fairshare or queue-based fairshare >> in LSF (eg. users in each queue gets a different priority) - I have >> not looked at Maui for a while, not sure if it has this feature, then >> it can be harder to implement or model in Grid Engine. >> >> If you let us know a bit more about your setup, then we can provide >> further help. >> >> Rayson >> >> >> >> On Fri, Apr 20, 2012 at 11:42 PM, Joseph A. Farran<[email protected]> wrote: >>> Hi All. >>> >>> I am a long time Torque/Maui admin running an HPC cluster looking to >>> Transition to Open Grid Engine. I am a newbie with OGE however. >>> >>> Are there any links and or helpful tips on moving to OGE from an admin point >>> of view? How to convert Torque qmgr queues, nodes, resource limits to the >>> equivalent in OGE? >>> >>> Thanks, >>> Joseph >>> >>> >>> _______________________________________________ >>> users mailing list >>> [email protected] >>> https://gridengine.org/mailman/listinfo/users >>> >> > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
