[slurm-dev] Re: NodeName and PartitionName format in slurm.conf
That's what I have done for now. I'm just a little OCD about how the conf file looks and don't care for 8 lines worth of wraparound. Managing is done at the pxeboot/kickstart level and yum. I can dynamically install the bits necessary for the various hardware differences (eg: GPUs, MIC cards, Infiniband, etc). Brian Andrus -Original Message- From: Benjamin Redling [mailto:benjamin.ra...@uni-jena.de] Sent: Wednesday, January 20, 2016 2:00 AM To: slurm-dev <slurm-dev@schedmd.com> Subject: [slurm-dev] Re: NodeName and PartitionName format in slurm.conf Am 19.01.2016 um 20:37 schrieb Andrus, Brian Contractor: > I am testing our slurm to replace our torque/moab setup here. > > The issue I have is to try and put all our node names in the NodeName > and PartitionName entries. > In our cluster, we name our nodes compute-- That seems to > be problem enough with the abilities to use ranges in slurm, but it is > compounded with the fact that the folks put the nodes in keeping 1u of > space in between. > So I have compute-1-[1,3,5,7,9,11...41] Why not simply use a comma separated list _generated_ from your inventory / DNS / /etc/hosts / etc. .? When you have outliers (2U, 4U -- do they have more resources too!?) it would make sense to group/partition by resources anyway. What are you using to manage inventory? Most configuration management and provisioning tools I know provide you with the necessary tools -- have a look at puppetlabs facter (or alternatives). http://slurm.schedmd.com/slurm.conf.html Multiple node names may be comma separated (e.g. "alpha,beta,gamma") and/or a simple node range expression may optionally be used to specify numeric ranges of nodes to avoid building a configuration file with large numbers of entries. The node range expression can contain one pair of square brackets with a sequence of comma separated numbers and/or ranges of numbers separated by a "-" (e.g. "linux[0-64,128]", or "lx[15,18,32-33]"). Note that the numeric ranges can include one or more leading zeros to indicate the numeric portion has a fixed number of digits (e.g. "linux[-1023]"). Up to two numeric ranges can be included in the expression (e.g. "rack[0-63]_blade[0-41]"). If one or more numeric expressions are included, one of them must be at the end of the name (e.g. "unit[0-31]rack" is invalid), but arbitrary names can always be used in a comma separated list. Complicating that logic wouldn't make much sense to me. Mapping host names to partitions shouldn't be too hard to script. In the worst case you copy the full/per-rack/per-resources host list to partitions and manually cherry-pick afterwards. Regards, Benjamin -- FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html vox: +49 3641 9 44323 | fax: +49 3641 9 44321
[slurm-dev] Re: NodeName and PartitionName format in slurm.conf
Am 19.01.2016 um 20:37 schrieb Andrus, Brian Contractor: I am testing our slurm to replace our torque/moab setup here. The issue I have is to try and put all our node names in the NodeName and PartitionName entries. In our cluster, we name our nodes compute-- That seems to be problem enough with the abilities to use ranges in slurm, but it is compounded with the fact that the folks put the nodes in keeping 1u of space in between. So I have compute-1-[1,3,5,7,9,11...41] Why not simply use a comma separated list _generated_ from your inventory / DNS / /etc/hosts / etc. .? When you have outliers (2U, 4U -- do they have more resources too!?) it would make sense to group/partition by resources anyway. What are you using to manage inventory? Most configuration management and provisioning tools I know provide you with the necessary tools -- have a look at puppetlabs facter (or alternatives). http://slurm.schedmd.com/slurm.conf.html Multiple node names may be comma separated (e.g. "alpha,beta,gamma") and/or a simple node range expression may optionally be used to specify numeric ranges of nodes to avoid building a configuration file with large numbers of entries. The node range expression can contain one pair of square brackets with a sequence of comma separated numbers and/or ranges of numbers separated by a "-" (e.g. "linux[0-64,128]", or "lx[15,18,32-33]"). Note that the numeric ranges can include one or more leading zeros to indicate the numeric portion has a fixed number of digits (e.g. "linux[-1023]"). Up to two numeric ranges can be included in the expression (e.g. "rack[0-63]_blade[0-41]"). If one or more numeric expressions are included, one of them must be at the end of the name (e.g. "unit[0-31]rack" is invalid), but arbitrary names can always be used in a comma separated list. Complicating that logic wouldn't make much sense to me. Mapping host names to partitions shouldn't be too hard to script. In the worst case you copy the full/per-rack/per-resources host list to partitions and manually cherry-pick afterwards. Regards, Benjamin -- FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html vox: +49 3641 9 44323 | fax: +49 3641 9 44321
[slurm-dev] Re: NodeName and PartitionName format in slurm.conf
Am 20.01.2016 um 11:00 schrieb Benjamin Redling: Am 19.01.2016 um 20:37 schrieb Andrus, Brian Contractor: I am testing our slurm to replace our torque/moab setup here. The issue I have is to try and put all our node names in the NodeName and PartitionName entries. In our cluster, we name our nodes compute-- That seems to be problem enough with the abilities to use ranges in slurm, but it is compounded with the fact that the folks put the nodes in keeping 1u of space in between. So I have compute-1-[1,3,5,7,9,11...41] Why not simply use a comma separated list _generated_ from your inventory / DNS / /etc/hosts / etc. .? < --- 8< ---> P.S. Totally forgot, you can configure a NodeName different from its NodeHostname: Node names can have up to three name specifications: NodeName is the name used by all Slurm tools when referring to the node, NodeAddr is the name or IP address Slurm uses to communicate with the node, and NodeHostname is the name returned by the command /bin/hostname -s. Only NodeName is required (the others default to the same name), although supporting all three parameters provides complete control over naming and addressing the nodes. See the slurm.conf man page for details on all configuration parameters. But I wouldn't do that. IMHO in case of an erroneous node it is just one more level of indirection -- cumbersome to find the culprit. Then again my host names don't depend on rack units. Regards, Benjamin -- FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html vox: +49 3641 9 44323 | fax: +49 3641 9 44321