Sorry - you're correct, I meant  <-l nodes=1:ppn=[count]> .  :-)

Hmmm...we've had some requests from clients specifically to support SGE, but this is a pretty key part of our functionality. Currently we can submit, but without the way to specify cores, the clients won't get the timing results they expect at all. Out of curiosity, does anyone have any good references for *why* this isn't the paradigm (it's certainly the atypical choice, from all the schedulers I've worked with), with more detail on how/why this system works in its place? That might give me some insights on how to proceed.

The only thing I can think of right now is some sort of script that they can run as part of installing our program that automatically sets up the necessary pe's for them, but I'm still foggy enough on pe's that I'm not sure if that's possible without knowing the details of their hardware. Thoughts?

Thank you!

-Allison


On 15/01/2014 3:58 PM, Reuti wrote:
Am 15.01.2014 um 23:28 schrieb Allison Walters:

We have OpenMP jobs that need a user-defined (usually more than one but less 
than all) number of cores on a single node for each job.  In addition to 
running these jobs, our program has an interface to the cluster so they can 
submit jobs through a custom GUI (and we build the qsub command in the 
background for the submission).  I'm trying to find a way for the job to 
request those multiple cores that does not depend on the cluster to be 
configured a certain way, since we have no control as to whether the client has 
a parallel environment created, how it's named, etc...
This is not in the paradigm of SGE. You can only create a consumable complex, 
attach it to each exechost and request the correct amount for each job, even 
serial ones (by a default of 1). But in this case, the memory requests (or 
other) won't be multiplied, as SGE always thinks it's a serial job. But then 
you replace the custom PE by a custom complex.


Basically, I'm just looking for the equivalent of -l nodes=[count]
Wouldn't it be: -l nodes=1:ppn=[count]

For -l nodes=[count] it's like SGE's allocation_rule $round_robin or $fill_up - 
depending on a setting somewhere in Torque (i.e. for all types of job the same 
will be applied all the time). It could spawn more than a node in either case.

-- Reuti


in PBS/Torque, or -n [count] in LSF, etc...  The program will use the correct 
number of cores we pass to it, but we need to pass that parameter to the 
cluster as well to ensure it only gets sent to a node with the correct amount 
of cores available.  This works fine in the other clusters we support but I'm 
completely at a loss as to how to do it in Grid Engine.  I feel like I must be 
missing something!  :-)

Thank you.

-Allison
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to