Re: [gridengine users] Requesting multiple cores on one node

2014-01-16 Thread Reuti
Am 16.01.2014 um 00:24 schrieb Joseph Farran:

 Allison,
 
 I love Grid Engine but this is the one feature I truly miss from Torque:
 
 -l nodes=x:ppn=[count]

RFE could be the setting allocation_rule $user_defined in a PE, which would 
still be necessary to be requested during job submission.


 snip
 The MPI job will NOT suspend jobs on the free64 queue.The job waits 
 until free64 jobs are done and then the job runs and grabs the entire nodes 
 correctly using the exclusive consumable.
 
 Is there a fix to this?   So that jobs on free64 ARE suspended when using -l 
 exclusive=true and pe mpi on our pub64 queue?

The suspension is the result of a job having started on this particular node, 
and for a fraction of a second the node is oversubscribed. This is also the 
reason why you may need more slots defined on an exechost level, as otherwise 
the limited slot count wouldn't allow to start the superordinated job.

What you can try: as the jobs in free64 are suspended anyway, attach the 
exclusive attribute on a queue level, i.e. attach it to pub64 instead to the 
exechost. The job will start in pub64 then and suspend the free64 jobs.


 Using other pe like openmp works just fine and jobs are suspended correctly.  
   So it's only with this combo.

Interesting - which version of SGE are you running?

-- Reuti


 Joseph
 
 
 On 01/15/2014 02:58 PM, Reuti wrote:
 Am 15.01.2014 um 23:28 schrieb Allison Walters:
 
 We have OpenMP jobs that need a user-defined (usually more than one but 
 less than all) number of cores on a single node for each job.  In addition 
 to running these jobs, our program has an interface to the cluster so they 
 can submit jobs through a custom GUI (and we build the qsub command in the 
 background for the submission).  I'm trying to find a way for the job to 
 request those multiple cores that does not depend on the cluster to be 
 configured a certain way, since we have no control as to whether the client 
 has a parallel environment created, how it's named, etc...
 This is not in the paradigm of SGE. You can only create a consumable 
 complex, attach it to each exechost and request the correct amount for each 
 job, even serial ones (by a default of 1). But in this case, the memory 
 requests (or other) won't be multiplied, as SGE always thinks it's a serial 
 job. But then you replace the custom PE by a custom complex.
 
 
 Basically, I'm just looking for the equivalent of -l nodes=[count]
 Wouldn't it be: -l nodes=1:ppn=[count]
 
 For -l nodes=[count] it's like SGE's allocation_rule $round_robin or 
 $fill_up - depending on a setting somewhere in Torque (i.e. for all types of 
 job the same will be applied all the time). It could spawn more than a node 
 in either case.
 
 -- Reuti
 
 
 in PBS/Torque, or -n [count] in LSF, etc...  The program will use the 
 correct number of cores we pass to it, but we need to pass that parameter 
 to the cluster as well to ensure it only gets sent to a node with the 
 correct amount of cores available.  This works fine in the other clusters 
 we support but I'm completely at a loss as to how to do it in Grid Engine.  
 I feel like I must be missing something!  :-)
 
 Thank you.
 
 -Allison
 ___
 users mailing list
 users@gridengine.org
 https://gridengine.org/mailman/listinfo/users
 
 ___
 users mailing list
 users@gridengine.org
 https://gridengine.org/mailman/listinfo/users
 
 


___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Requesting multiple cores on one node

2014-01-16 Thread Reuti
Am 16.01.2014 um 05:24 schrieb tele...@shaw.ca:

 Sorry - you're correct, I meant  -l nodes=1:ppn=[count] .  :-)
 
 Hmmm...we've had some requests from clients specifically to support SGE, but 
 this is a pretty key part of our functionality.  Currently we can submit, but 
 without the way to specify cores, the clients won't get the timing results 
 they expect at all.  Out of curiosity, does anyone have any good references 
 for *why* this isn't the paradigm

I would say: the idea was, that the admin of the cluster should provide all the 
necessary setups and scripts to achieve a tight integration of a parallel 
application into the cluster. This is nothing the end user of a cluster should 
deal with. AFAIK there is no start_proc_args or control_slaves in Torque to 
reformat the generated list of machines or catch a plain rsh/ssh call of a 
user's application.

10 years ago LAM/MPI or MPICH(1) weren't running out of the box in a cluster in 
a tightly integrated way. And there the PE's definition was a great help (still 
for other's like Linda or IBM's Platform MPI). Sure, this changed over time as 
nowadays Open MPI and MPICH2/3 are supporting SGE's `qrsh` and Torque's TM 
(Task Manager) directly.

Another reason may have been, to limit certain applications (i.e. PEs) to a 
subset of nodes and how many slots are used for a particular applications. This 
is nothing what can be controlled in Torque I think.

On the other hand: if you have an application which needs 4 cores on every 
machine (i.e. allocation_rule 4 resp. -l nodes=5:ppn=4), how can you prevent 
a user from requesting -l nodes=2:ppn=4+6:ppn=2 in Torque for a 20 core job?


 (it's certainly the atypical choice, from all the schedulers I've worked 
 with), with more detail on how/why this system works in its place?  That 
 might give me some insights on how to proceed.

Taking Open MPI and MPICH2/3 into account, an RFE could be that a PE called 
smp is always defined by default (and it's even not possible to remove it), 
but it's up to the user to attach it to certain machines/hostgroups. 
Nevertheless, as the necessary settings in such a PE are the ones which are 
offered by SGE by default when you create a new PE, the definition of it is 
quite easy anyway.


 The only thing I can think of right now is some sort of script that they can 
 run as part of installing our program that automatically sets up the 
 necessary pe's for them, but I'm still foggy enough on pe's that I'm not sure 
 if that's possible without knowing the details of their hardware.

The hardware doesn't matter much for such a PE, more the various defined 
queues: to which one you will attach the newly created PE?

During installation you can check whether there is such a PE (scan all PEs and 
look for allocation_rule $pe_slots and a proper slot count therein = use 
this name for the PE request) and whether it's attached to any queue. Well, 
this doesn't prevent the admin of the cluster to change the setting after 
installation of your software though.

-- Reuti


  Thoughts?
 
 Thank you!
 
 -Allison
 
 
 On 15/01/2014 3:58 PM, Reuti wrote:
 Am 15.01.2014 um 23:28 schrieb Allison Walters:
 
 We have OpenMP jobs that need a user-defined (usually more than one but 
 less than all) number of cores on a single node for each job.  In addition 
 to running these jobs, our program has an interface to the cluster so they 
 can submit jobs through a custom GUI (and we build the qsub command in the 
 background for the submission).  I'm trying to find a way for the job to 
 request those multiple cores that does not depend on the cluster to be 
 configured a certain way, since we have no control as to whether the client 
 has a parallel environment created, how it's named, etc...
 This is not in the paradigm of SGE. You can only create a consumable 
 complex, attach it to each exechost and request the correct amount for each 
 job, even serial ones (by a default of 1). But in this case, the memory 
 requests (or other) won't be multiplied, as SGE always thinks it's a serial 
 job. But then you replace the custom PE by a custom complex.
 
 
 Basically, I'm just looking for the equivalent of -l nodes=[count]
 Wouldn't it be: -l nodes=1:ppn=[count]
 
 For -l nodes=[count] it's like SGE's allocation_rule $round_robin or 
 $fill_up - depending on a setting somewhere in Torque (i.e. for all types of 
 job the same will be applied all the time). It could spawn more than a node 
 in either case.
 
 -- Reuti
 
 
 in PBS/Torque, or -n [count] in LSF, etc...  The program will use the 
 correct number of cores we pass to it, but we need to pass that parameter 
 to the cluster as well to ensure it only gets sent to a node with the 
 correct amount of cores available.  This works fine in the other clusters 
 we support but I'm completely at a loss as to how to do it in Grid Engine.  
 I feel like I must be missing something!  :-)
 
 Thank you.
 
 -Allison
 

Re: [gridengine users] Requesting multiple cores on one node

2014-01-16 Thread William Hay
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 15/01/14 22:28, Allison Walters wrote:
 We have OpenMP jobs that need a user-defined (usually more than one
 but less than all) number of cores on a single node for each job.
 In addition to running these jobs, our program has an interface to
 the cluster so they can submit jobs through a custom GUI (and we
 build the qsub command in the background for the submission).  I'm
 trying to find a way for the job to request those multiple cores
 that does not depend on the cluster to be configured a certain way,
 since we have no control as to whether the client has a parallel
 environment created, how it's named, etc...
 
 Basically, I'm just looking for the equivalent of -l nodes=[count]
 in PBS/Torque, or -n [count] in LSF, etc...  The program will use
 the correct number of cores we pass to it, but we need to pass that
 parameter to the cluster as well to ensure it only gets sent to a
 node with the correct amount of cores available.  This works fine
 in the other clusters we support but I'm completely at a loss as to
 how to do it in Grid Engine.  I feel like I must be missing
 something!  :-)
 
 Thank you.
I think the difference is underlying philosophy.  Grid Engine:

1)Give a lot of flexibility to the administrator.  This makes it
easier for the admin to do things the designers didn't anticipate.  It
is less a batch scheduler more a batch scheduler construction kit with
a reasonable sample configuration.  Despite attempts to port it to
Windows Grid Engine is basically a unix batch scheduler and follows
the unix philosophy.

2)For the most part request the resources you need rather than try to
tell the scheduler how to allocate jobs to nodes.  Which resources you
can request is up to the admin.


Your actual problem isn't soluble in a configuration agnostic way but
on many SGE clusters:
There is a pe called smp with an allocation rule of $pe_slots.  There
is also commonly a resource called exclusive that can be requested to
get exclusive access to a host.


However this is just a common configuration and wouldn't work on the
cluster I admin for example.  You might choose to assume that anyone
who doesn't use the configuration outlined above is familiar with
their configuration and can  adapt your job submission mechanism if
you let them.

If you want the more general ppn then the multiple PE's with
allocation rules 1,2,3.. up to the largest number of cores on a single
node should do it (never tried this just going by the man page).

William

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJS16S8AAoJEKCzH4joEjNW93UP/A6Nd0+wdKAENjsMKQfuoktL
ZcGTUrMco7EaqV5V20UE1FHXfl26U/DM6tffPtWcQsFAGwRkYHNxEMCkE76F4CnQ
M4x+IEJMMo4B85L5kp2/IiaoiE0Fe2AJCm8cez5kw1uEj39G+RVE0PDQW07HF8eG
oksxHHKe+olr7BAEE5vdePhIjij78rH+H8qc6kr/KI507N0/eMocxkIKc26hC+JK
zoTNnhgfC3VG4lnG1PUvqHXG9mYhZ+eaz+VLhHSaPnXoiZyP4KAREMfxX8DEnZSJ
+IKibConmS8StWABBpmSxEfEZ/5rwqulvJP2UkrfwbZm8OzStNZjWnexCCoV9sJd
GYtD66om4/3rMtnpPJYwemtII1qzo4OvZXEggCRPEsjBJTUitgeJ2RpfSqpyhMwO
ABg4FY+//XDxr4WKrjpOy7dlEzCbCzadV5bh8aXhOILlVesbcZO1afVRKJ6vEil3
Zm0JasHL+pNLGDK1Qh5Zc37Gp9lsKgEK5ClbmSSzX+bjO2MeHkM7FUvzUHnqXo4c
TIH95nM699X2CJju3ZIsSoO6yF3zRL8Yy1XCMkq3z7rNxx+jHq38bV+JZltyGNOe
vfSyr0b3lAKDgdlAGnfYdlYj0KlThdWXQpMNUCrXagohIOrlT8LuJLJbOeLpVSS6
ztDhufhYTgUsclRefbBM
=EuXr
-END PGP SIGNATURE-

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Requesting multiple cores on one node

2014-01-16 Thread Reuti
Am 16.01.2014 um 09:54 schrieb Reuti:

 Am 16.01.2014 um 05:24 schrieb tele...@shaw.ca:
 
 Sorry - you're correct, I meant  -l nodes=1:ppn=[count] .  :-)
 
 Hmmm...we've had some requests from clients specifically to support SGE, but 
 this is a pretty key part of our functionality.  Currently we can submit, 
 but without the way to specify cores, the clients won't get the timing 
 results they expect at all.  Out of curiosity, does anyone have any good 
 references for *why* this isn't the paradigm
 
 I would say: the idea was, that the admin of the cluster should provide all 
 the necessary setups and scripts to achieve a tight integration of a parallel 
 application into the cluster. This is nothing the end user of a cluster 
 should deal with. AFAIK there is no start_proc_args or control_slaves in 
 Torque to reformat the generated list of machines or catch a plain 
 rsh/ssh call of a user's application.
 
 10 years ago LAM/MPI or MPICH(1) weren't running out of the box in a cluster 
 in a tightly integrated way. And there the PE's definition was a great help 
 (still for other's like Linda or IBM's Platform MPI). Sure, this changed over 
 time as nowadays Open MPI and MPICH2/3 are supporting SGE's `qrsh` and 
 Torque's TM (Task Manager) directly.
 
 Another reason may have been, to limit certain applications (i.e. PEs) to a 
 subset of nodes and how many slots are used for a particular applications. 
 This is nothing what can be controlled in Torque I think.
 
 On the other hand: if you have an application which needs 4 cores on every 
 machine (i.e. allocation_rule 4 resp. -l nodes=5:ppn=4), how can you 
 prevent a user from requesting -l nodes=2:ppn=4+6:ppn=2 in Torque for a 20 
 core job?

NB: I just recall that once I faced the issue in Torque where -l 
nodes=5:ppn=4 gave me 8 cores on a machine, and it was somewhere a 
(clusterwide) setting in Torque/Maui to allow bunches of 4 to be allocated more 
than once on a machine what we didn't want (in SGE it's only giving you 4 once 
on each machine). Different queuing system having different features - 
different constraints.

-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Requesting multiple cores on one node

2014-01-16 Thread Allison Walters
- Original Message -
From: Reuti re...@staff.uni-marburg.de

 The hardware doesn't matter much for such a PE, more the various defined 
 queues: to which one you will attach the newly created PE?

 During installation you can check whether there is such a PE (scan all PEs 
 and look for allocation_rule $pe_slots and a proper slot  count therein = 
 use this name for the PE request) and whether it's attached to any queue. 
 Well, this doesn't prevent the admin of the  cluster to change the setting 
 after installation of your software though.

I'm not worried about whether they change things afterwards, so much - then 
it's something unsupported they've done.  :-)  I'm mostly worried about the 
scenario of them having to do as little as possible in the first place to get 
it to work.

Thank you for the information!  I'm going to take some more time to digest, and 
then see what I can come up with.

-Allison
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Requesting multiple cores on one node

2014-01-16 Thread Allison Walters
Great info - thank you!


-Allison


- Original Message -
From: William Hay w@ucl.ac.uk
To: Allison Walters tele...@shaw.ca, users@gridengine.org
Sent: Thursday, January 16, 2014 2:22:04 AM
Subject: Re: [gridengine users] Requesting multiple cores on one node

I think the difference is underlying philosophy.  Grid Engine:

1)Give a lot of flexibility to the administrator.  This makes it
easier for the admin to do things the designers didn't anticipate.  It
is less a batch scheduler more a batch scheduler construction kit with
a reasonable sample configuration.  Despite attempts to port it to
Windows Grid Engine is basically a unix batch scheduler and follows
the unix philosophy.

2)For the most part request the resources you need rather than try to
tell the scheduler how to allocate jobs to nodes.  Which resources you
can request is up to the admin.


Your actual problem isn't soluble in a configuration agnostic way but
on many SGE clusters:
There is a pe called smp with an allocation rule of $pe_slots.  There
is also commonly a resource called exclusive that can be requested to
get exclusive access to a host.


However this is just a common configuration and wouldn't work on the
cluster I admin for example.  You might choose to assume that anyone
who doesn't use the configuration outlined above is familiar with
their configuration and can  adapt your job submission mechanism if
you let them.

If you want the more general ppn then the multiple PE's with
allocation rules 1,2,3.. up to the largest number of cores on a single
node should do it (never tried this just going by the man page).

William

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJS16S8AAoJEKCzH4joEjNW93UP/A6Nd0+wdKAENjsMKQfuoktL
ZcGTUrMco7EaqV5V20UE1FHXfl26U/DM6tffPtWcQsFAGwRkYHNxEMCkE76F4CnQ
M4x+IEJMMo4B85L5kp2/IiaoiE0Fe2AJCm8cez5kw1uEj39G+RVE0PDQW07HF8eG
oksxHHKe+olr7BAEE5vdePhIjij78rH+H8qc6kr/KI507N0/eMocxkIKc26hC+JK
zoTNnhgfC3VG4lnG1PUvqHXG9mYhZ+eaz+VLhHSaPnXoiZyP4KAREMfxX8DEnZSJ
+IKibConmS8StWABBpmSxEfEZ/5rwqulvJP2UkrfwbZm8OzStNZjWnexCCoV9sJd
GYtD66om4/3rMtnpPJYwemtII1qzo4OvZXEggCRPEsjBJTUitgeJ2RpfSqpyhMwO
ABg4FY+//XDxr4WKrjpOy7dlEzCbCzadV5bh8aXhOILlVesbcZO1afVRKJ6vEil3
Zm0JasHL+pNLGDK1Qh5Zc37Gp9lsKgEK5ClbmSSzX+bjO2MeHkM7FUvzUHnqXo4c
TIH95nM699X2CJju3ZIsSoO6yF3zRL8Yy1XCMkq3z7rNxx+jHq38bV+JZltyGNOe
vfSyr0b3lAKDgdlAGnfYdlYj0KlThdWXQpMNUCrXagohIOrlT8LuJLJbOeLpVSS6
ztDhufhYTgUsclRefbBM
=EuXr
-END PGP SIGNATURE-

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Requesting multiple cores on one node

2014-01-15 Thread Reuti
Am 15.01.2014 um 23:28 schrieb Allison Walters:

 We have OpenMP jobs that need a user-defined (usually more than one but less 
 than all) number of cores on a single node for each job.  In addition to 
 running these jobs, our program has an interface to the cluster so they can 
 submit jobs through a custom GUI (and we build the qsub command in the 
 background for the submission).  I'm trying to find a way for the job to 
 request those multiple cores that does not depend on the cluster to be 
 configured a certain way, since we have no control as to whether the client 
 has a parallel environment created, how it's named, etc...

This is not in the paradigm of SGE. You can only create a consumable complex, 
attach it to each exechost and request the correct amount for each job, even 
serial ones (by a default of 1). But in this case, the memory requests (or 
other) won't be multiplied, as SGE always thinks it's a serial job. But then 
you replace the custom PE by a custom complex.


 Basically, I'm just looking for the equivalent of -l nodes=[count]

Wouldn't it be: -l nodes=1:ppn=[count]

For -l nodes=[count] it's like SGE's allocation_rule $round_robin or $fill_up - 
depending on a setting somewhere in Torque (i.e. for all types of job the same 
will be applied all the time). It could spawn more than a node in either case.

-- Reuti


 in PBS/Torque, or -n [count] in LSF, etc...  The program will use the correct 
 number of cores we pass to it, but we need to pass that parameter to the 
 cluster as well to ensure it only gets sent to a node with the correct amount 
 of cores available.  This works fine in the other clusters we support but I'm 
 completely at a loss as to how to do it in Grid Engine.  I feel like I must 
 be missing something!  :-)
 
 Thank you.
 
 -Allison
 ___
 users mailing list
 users@gridengine.org
 https://gridengine.org/mailman/listinfo/users


___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Requesting multiple cores on one node

2014-01-15 Thread Joseph Farran

Allison,

I love Grid Engine but this is the one feature I truly miss from Torque:

-l nodes=x:ppn=[count]


Reuti,

We have a complex setup trying to accomplish this same thing and it kind of 
works but we have an issue with job not starting when jobs are running on a 
subordinate queue.

First, here is our setup:

qconf -sc | egrep #|exclu
#name   shortcut   typerelop   requestable consumable 
default  urgency
#--
exclusive   excl   BOOLEXCLYES YESFALSE 
   1000

Our MPI PE has:
$ qconf -sp mpi
pe_namempi
slots  
user_lists NONE
xuser_listsNONE
start_proc_argsNONE
stop_proc_args NONE
allocation_rule$fill_up
control_slaves TRUE
job_is_first_task  TRUE
urgency_slots  min
accounting_summary TRUE
qsort_args NONE

Our two queues:
$ qconf -sq free64 | grep sub
subordinate_list  NONE

$ qconf -sq pub64 | grep sub
subordinate_list  free64=1

When we submit our MPI jobs to pub64 with:

#!/bin/bash
#$ -q pub64
#$ -pe mpi 256
#$ -l exclusive=true

The MPI job will NOT suspend jobs on the free64 queue.The job waits until free64 
jobs are done and then the job runs and grabs the entire nodes correctly using the 
exclusive consumable.

Is there a fix to this?   So that jobs on free64 ARE suspended when using -l 
exclusive=true and pe mpi on our pub64 queue?

Using other pe like openmp works just fine and jobs are suspended correctly.
So it's only with this combo.

Joseph


On 01/15/2014 02:58 PM, Reuti wrote:

Am 15.01.2014 um 23:28 schrieb Allison Walters:


We have OpenMP jobs that need a user-defined (usually more than one but less 
than all) number of cores on a single node for each job.  In addition to 
running these jobs, our program has an interface to the cluster so they can 
submit jobs through a custom GUI (and we build the qsub command in the 
background for the submission).  I'm trying to find a way for the job to 
request those multiple cores that does not depend on the cluster to be 
configured a certain way, since we have no control as to whether the client has 
a parallel environment created, how it's named, etc...

This is not in the paradigm of SGE. You can only create a consumable complex, 
attach it to each exechost and request the correct amount for each job, even 
serial ones (by a default of 1). But in this case, the memory requests (or 
other) won't be multiplied, as SGE always thinks it's a serial job. But then 
you replace the custom PE by a custom complex.



Basically, I'm just looking for the equivalent of -l nodes=[count]

Wouldn't it be: -l nodes=1:ppn=[count]

For -l nodes=[count] it's like SGE's allocation_rule $round_robin or $fill_up - 
depending on a setting somewhere in Torque (i.e. for all types of job the same 
will be applied all the time). It could spawn more than a node in either case.

-- Reuti



in PBS/Torque, or -n [count] in LSF, etc...  The program will use the correct 
number of cores we pass to it, but we need to pass that parameter to the 
cluster as well to ensure it only gets sent to a node with the correct amount 
of cores available.  This works fine in the other clusters we support but I'm 
completely at a loss as to how to do it in Grid Engine.  I feel like I must be 
missing something!  :-)

Thank you.

-Allison
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users



___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Requesting multiple cores on one node

2014-01-15 Thread teleute

Sorry - you're correct, I meant  -l nodes=1:ppn=[count] .  :-)

Hmmm...we've had some requests from clients specifically to support SGE, 
but this is a pretty key part of our functionality.  Currently we can 
submit, but without the way to specify cores, the clients won't get the 
timing results they expect at all.  Out of curiosity, does anyone have 
any good references for *why* this isn't the paradigm (it's certainly 
the atypical choice, from all the schedulers I've worked with), with 
more detail on how/why this system works in its place?  That might give 
me some insights on how to proceed.


The only thing I can think of right now is some sort of script that they 
can run as part of installing our program that automatically sets up the 
necessary pe's for them, but I'm still foggy enough on pe's that I'm not 
sure if that's possible without knowing the details of their hardware.  
Thoughts?


Thank you!

-Allison


On 15/01/2014 3:58 PM, Reuti wrote:

Am 15.01.2014 um 23:28 schrieb Allison Walters:


We have OpenMP jobs that need a user-defined (usually more than one but less 
than all) number of cores on a single node for each job.  In addition to 
running these jobs, our program has an interface to the cluster so they can 
submit jobs through a custom GUI (and we build the qsub command in the 
background for the submission).  I'm trying to find a way for the job to 
request those multiple cores that does not depend on the cluster to be 
configured a certain way, since we have no control as to whether the client has 
a parallel environment created, how it's named, etc...

This is not in the paradigm of SGE. You can only create a consumable complex, 
attach it to each exechost and request the correct amount for each job, even 
serial ones (by a default of 1). But in this case, the memory requests (or 
other) won't be multiplied, as SGE always thinks it's a serial job. But then 
you replace the custom PE by a custom complex.



Basically, I'm just looking for the equivalent of -l nodes=[count]

Wouldn't it be: -l nodes=1:ppn=[count]

For -l nodes=[count] it's like SGE's allocation_rule $round_robin or $fill_up - 
depending on a setting somewhere in Torque (i.e. for all types of job the same 
will be applied all the time). It could spawn more than a node in either case.

-- Reuti



in PBS/Torque, or -n [count] in LSF, etc...  The program will use the correct 
number of cores we pass to it, but we need to pass that parameter to the 
cluster as well to ensure it only gets sent to a node with the correct amount 
of cores available.  This works fine in the other clusters we support but I'm 
completely at a loss as to how to do it in Grid Engine.  I feel like I must be 
missing something!  :-)

Thank you.

-Allison
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users




___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users