[slurm-dev] Re: NodeName and PartitionName format in slurm.conf

2016-01-20 Thread Andrus, Brian Contractor
That's what I have done for now. I'm just a little OCD about how the conf file 
looks and don't care for 8 lines worth of wraparound. 
Managing is done at the pxeboot/kickstart level and yum. I can dynamically 
install the bits necessary for the various hardware differences (eg: GPUs, MIC 
cards, Infiniband, etc).

Brian Andrus

-Original Message-
From: Benjamin Redling [mailto:benjamin.ra...@uni-jena.de] 
Sent: Wednesday, January 20, 2016 2:00 AM
To: slurm-dev <slurm-dev@schedmd.com>
Subject: [slurm-dev] Re: NodeName and PartitionName format in slurm.conf


Am 19.01.2016 um 20:37 schrieb Andrus, Brian Contractor:
> I am testing our slurm to replace our torque/moab setup here.
>
> The issue I have is to try and put all our node names in the NodeName 
> and PartitionName entries.
> In our cluster, we name our nodes compute-- That seems to 
> be problem enough with the abilities to use ranges in slurm, but it is 
> compounded with the fact that the folks put the nodes in keeping 1u of 
> space in between.
> So I have compute-1-[1,3,5,7,9,11...41]

Why not simply use a comma separated list _generated_ from your inventory / DNS 
/ /etc/hosts / etc. .?

When you have outliers (2U, 4U -- do they have more resources too!?) it would 
make sense to group/partition by resources anyway.
What are you using to manage inventory? Most configuration management and 
provisioning tools I know provide you with the necessary tools -- have a look 
at puppetlabs facter (or alternatives).

http://slurm.schedmd.com/slurm.conf.html

Multiple node names may be comma separated (e.g. "alpha,beta,gamma") and/or a 
simple node range expression may optionally be used to specify numeric ranges 
of nodes to avoid building a configuration file with large numbers of entries. 
The node range expression can contain one pair of square brackets with a 
sequence of comma separated numbers and/or ranges of numbers separated by a "-" 
(e.g. "linux[0-64,128]", or "lx[15,18,32-33]"). Note that the numeric ranges 
can include one or more leading zeros to indicate the numeric portion has a 
fixed number of digits (e.g. "linux[-1023]"). Up to two numeric ranges can 
be included in the expression (e.g. "rack[0-63]_blade[0-41]"). If one or more 
numeric expressions are included, one of them must be at the end of the name 
(e.g. "unit[0-31]rack" is invalid), but arbitrary names can always be used in a 
comma separated list.


Complicating that logic wouldn't make much sense to me.
Mapping host names to partitions shouldn't be too hard to script.
In the worst case you copy the full/per-rack/per-resources host list to 
partitions and manually cherry-pick afterwards.

Regards,
Benjamin
--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: NodeName and PartitionName format in slurm.conf

2016-01-20 Thread Benjamin Redling


Am 19.01.2016 um 20:37 schrieb Andrus, Brian Contractor:

I am testing our slurm to replace our torque/moab setup here.

The issue I have is to try and put all our node names in the NodeName
and PartitionName entries.
In our cluster, we name our nodes compute--
That seems to be problem enough with the abilities to use ranges in
slurm, but it is compounded with the fact that the folks put the nodes
in keeping 1u of space in between.
So I have compute-1-[1,3,5,7,9,11...41]


Why not simply use a comma separated list _generated_ from your 
inventory / DNS / /etc/hosts / etc. .?


When you have outliers (2U, 4U -- do they have more resources too!?) it 
would make sense to group/partition by resources anyway.
What are you using to manage inventory? Most configuration management 
and provisioning tools I know provide you with the necessary tools -- 
have a look at puppetlabs facter (or alternatives).


http://slurm.schedmd.com/slurm.conf.html

Multiple node names may be comma separated (e.g. "alpha,beta,gamma") 
and/or a simple node range expression may optionally be used to specify 
numeric ranges of nodes to avoid building a configuration file with 
large numbers of entries. The node range expression can contain one pair 
of square brackets with a sequence of comma separated numbers and/or 
ranges of numbers separated by a "-" (e.g. "linux[0-64,128]", or 
"lx[15,18,32-33]"). Note that the numeric ranges can include one or more 
leading zeros to indicate the numeric portion has a fixed number of 
digits (e.g. "linux[-1023]"). Up to two numeric ranges can be 
included in the expression (e.g. "rack[0-63]_blade[0-41]"). If one or 
more numeric expressions are included, one of them must be at the end of 
the name (e.g. "unit[0-31]rack" is invalid), but arbitrary names can 
always be used in a comma separated list.



Complicating that logic wouldn't make much sense to me.
Mapping host names to partitions shouldn't be too hard to script.
In the worst case you copy the full/per-rack/per-resources host list to 
partitions and manually cherry-pick afterwards.


Regards,
Benjamin
--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: NodeName and PartitionName format in slurm.conf

2016-01-20 Thread Benjamin Redling


Am 20.01.2016 um 11:00 schrieb Benjamin Redling:

Am 19.01.2016 um 20:37 schrieb Andrus, Brian Contractor:

I am testing our slurm to replace our torque/moab setup here.

The issue I have is to try and put all our node names in the NodeName
and PartitionName entries.
In our cluster, we name our nodes compute--
That seems to be problem enough with the abilities to use ranges in
slurm, but it is compounded with the fact that the folks put the nodes
in keeping 1u of space in between.
So I have compute-1-[1,3,5,7,9,11...41]


Why not simply use a comma separated list _generated_ from your
inventory / DNS / /etc/hosts / etc. .?

< --- 8< --->

P.S.
Totally forgot, you can configure a NodeName different from its 
NodeHostname:


Node names can have up to three name specifications: NodeName is the 
name used by all Slurm tools when referring to the node, NodeAddr is the 
name or IP address Slurm uses to communicate with the node, and 
NodeHostname is the name returned by the command /bin/hostname -s. Only 
NodeName is required (the others default to the same name), although 
supporting all three parameters provides complete control over naming 
and addressing the nodes. See the slurm.conf man page for details on all 
configuration parameters.



But I wouldn't do that. IMHO in case of an erroneous node it is just one 
more level of indirection -- cumbersome to find the culprit.

Then again my host names don't depend on rack units.

Regards, Benjamin
--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321