[slurm-dev] Re: Cores, CPUs, and threads: take 2
On 14/09/17 11:07, Lachlan Musicman wrote: > Node configuration differs from hardware: CPUs=8:8(hw) Boards=1:1(hw) > SocketsPerBoard=8:1(hw) CoresPerSocket=1:4(hw) ThreadsPerCore=1:2(hw) OK, so this is saying that Slurm is seeing: 8 CPUs 1 board 1 socket per board 4 cores per socket 2 threads per core which is what lscpu also describes the node as Whereas the config that it thinks it should have is: 8 CPUs 1 board 8 sockets per board 1 core per socket 1 thread per core which to me looks like what you would expect with just CPUS=8 in the config and nothing else. I guess a couple of questions: 1) Have you restarted slurmctld and slurmd everywhere? 2) Can you confirm that slurm.conf is the same everywhere? 3) what does slurmd -C report? cheers! Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: On the need for slurm uid/gid consistency
I would suggest it is a more general requirement, not simply enforced by use of munge, which does imply a unified uid trust level across all nodes using the same preshared key, but also when jobs are started, they are started with a particular uid and other credentials (transmitted in the slurm RPCs) for the intended user. If there are different uid/gid values across the system, this could prove to be problematic in many different ways. Given that, I would almost suggest that from a package maintainer perspective, you should avoid creating the slurm user, and leave it for the site to solve in whatever makes most sense for them. -Doug On Sep 13, 2017 17:12, "Christopher Samuel"wrote: > > On 13/09/17 04:53, Phil K wrote: > > > I'm hoping someone can provide an explanation as to why slurm > > requires uid/gid consistency across nodes, with emphasis on the need > > for the 'SlurmUser' to be uid/gid-consistent. > > I think this is a consequence of the use of Munge, rather than being > inherent in Slurm itself. > > https://dun.github.io/munge/ > > # It allows a process to authenticate the UID and GID of another > # local or remote process within a group of hosts having common > # users and groups > > Gory details are in the munged(8) manual page: > > https://github.com/dun/munge/wiki/Man-8-munged > > But I think the core of the matter is: > > # When a credential is validated, munged first checks the > # message authentication code to ensure the credential has > # not been subsequently altered. Next, it checks the embedded > # UID/GID restrictions to determine whether the requesting > # client is allowed to decode it. > > So if the UID's & GID's of the user differ across systems then it > appears it will not allow the receiver to validate the message. > > cheers, > Chris > -- > Christopher SamuelSenior Systems Administrator > Melbourne Bioinformatics - The University of Melbourne > Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 >
[slurm-dev] Re: Cores, CPUs, and threads: take 2
On 14 September 2017 at 11:10, Christopher Samuelwrote: > > On 14/09/17 11:07, Lachlan Musicman wrote: > > > Node configuration differs from hardware: CPUs=8:8(hw) Boards=1:1(hw) > > SocketsPerBoard=8:1(hw) CoresPerSocket=1:4(hw) ThreadsPerCore=1:2(hw) > > Hmm, are you virtualised by some chance? > > If so it might be that the VM layer is lying to the guest about the > actual hardware layout. > > What does "lscpu" say? > No, not virtualised. [root@papr-res-compute34 slurm]# lscpu Architecture: x86_64 CPU op-mode(s):32-bit, 64-bit Byte Order:Little Endian CPU(s):8 On-line CPU(s) list: 0-7 Thread(s) per core:2 Core(s) per socket:4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family:6 Model: 60 Model name:Intel(R) Xeon(R) CPU E3-1275L v3 @ 2.70GHz Stepping: 3 CPU MHz: 2699.791 CPU max MHz: 3900. CPU min MHz: 800. BogoMIPS: 5399.58 Virtualization:VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 8192K NUMA node0 CPU(s): 0-7 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts
[slurm-dev] Re: Cores, CPUs, and threads: take 2
On 14/09/17 11:07, Lachlan Musicman wrote: > Node configuration differs from hardware: CPUs=8:8(hw) Boards=1:1(hw) > SocketsPerBoard=8:1(hw) CoresPerSocket=1:4(hw) ThreadsPerCore=1:2(hw) Hmm, are you virtualised by some chance? If so it might be that the VM layer is lying to the guest about the actual hardware layout. What does "lscpu" say? cheers, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: On the need for slurm uid/gid consistency
On 13/09/17 04:53, Phil K wrote: > I'm hoping someone can provide an explanation as to why slurm > requires uid/gid consistency across nodes, with emphasis on the need > for the 'SlurmUser' to be uid/gid-consistent. I think this is a consequence of the use of Munge, rather than being inherent in Slurm itself. https://dun.github.io/munge/ # It allows a process to authenticate the UID and GID of another # local or remote process within a group of hosts having common # users and groups Gory details are in the munged(8) manual page: https://github.com/dun/munge/wiki/Man-8-munged But I think the core of the matter is: # When a credential is validated, munged first checks the # message authentication code to ensure the credential has # not been subsequently altered. Next, it checks the embedded # UID/GID restrictions to determine whether the requesting # client is allowed to decode it. So if the UID's & GID's of the user differ across systems then it appears it will not allow the receiver to validate the message. cheers, Chris -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
[slurm-dev] Re: On the need for slurm uid/gid consistency
I'm not trying to solve any problems, I'm just trying to understand the issue well enough so that I can make informed decisions as a downstream package maintainer. With the need for uid/gid consistency, we either need to pass on the issue, i.e. package for use by root, or we need to do difficult things like persuade the packaging committee to reserve a uid/gid solely for the use by this package. I can tell you that the latter is a difficult sell. Furthermore, reserving a distro-wide uid/gid may be the wrong thing to do given that many, if not most people will want to manage users with ldap-type tools anyway. My thinking at this point is to go with a root deployment and make no attempt to introduce uids that may not be wanted. It would be useful to have a script which takes a username as $1 and performs all the needed actions to convert a node to that SlurmUser, so that people who install and start the daemons as root have a path to convert the installation to some other user. On Wednesday, September 13, 2017 4:09 AM, Janne Blomqvistwrote: On 2017-09-12 21:52, Phil K wrote: > I'm hoping someone can provide an explanation as to why slurm requires > uid/gid consistency > across nodes, with emphasis on the need for the 'SlurmUser' to be > uid/gid-consistent. I know > that slurmctld and slurmdbd can run as user `slurm` and that this would > be safer than running > as root. slurmd must run as root in any case, to my knowledge. Is the > need for uid consistency, > esp with the SlurmUser a difficult barrier to overcome? Please clarify > for me. Thanks. Phil > > Yes, this is tedious. Either you need to create the slurm user with a consistent uid/gid when provisioning a node, or then ldap/nis/whatever needs to be up and running before you start any slurm daemons. It would be nicer if the rpm's could just create a slurm user when installing the packages for slurmctld/slurmdbd, and let the system allocate the uid/gid so it doesn't conflict with any other local uid/gid's but not having to ensure the slurm uid/gid is globally unique. Anyway, I think the reason behind this is that slurmd needs to ensure that control messages coming from slurmctld really come from the slurmctld daemon, and not from some random unprivileged process. And as munge is needed anyway to ensure that end user uid/gid's are correct, it's also used to ensure that the control messages really come from the SlurmUser. I guess in principle you could get rid of the requirement for a SlurmUser with consistent uid/gid, say by using certificates like TLS or ssh. But then you'd have to provision the certificates as part of the deployment, so I'm not sure that buys you any more ease of use in the end. Or does anybody have a better idea? -- Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist Aalto University School of Science, PHYS & NBE +358503841576 || janne.blomqv...@aalto.fi
[slurm-dev] Re: On the need for slurm uid/gid consistency
On 2017-09-12 21:52, Phil K wrote: I'm hoping someone can provide an explanation as to why slurm requires uid/gid consistency across nodes, with emphasis on the need for the 'SlurmUser' to be uid/gid-consistent. I know that slurmctld and slurmdbd can run as user `slurm` and that this would be safer than running as root. slurmd must run as root in any case, to my knowledge. Is the need for uid consistency, esp with the SlurmUser a difficult barrier to overcome? Please clarify for me. Thanks. Phil Yes, this is tedious. Either you need to create the slurm user with a consistent uid/gid when provisioning a node, or then ldap/nis/whatever needs to be up and running before you start any slurm daemons. It would be nicer if the rpm's could just create a slurm user when installing the packages for slurmctld/slurmdbd, and let the system allocate the uid/gid so it doesn't conflict with any other local uid/gid's but not having to ensure the slurm uid/gid is globally unique. Anyway, I think the reason behind this is that slurmd needs to ensure that control messages coming from slurmctld really come from the slurmctld daemon, and not from some random unprivileged process. And as munge is needed anyway to ensure that end user uid/gid's are correct, it's also used to ensure that the control messages really come from the SlurmUser. I guess in principle you could get rid of the requirement for a SlurmUser with consistent uid/gid, say by using certificates like TLS or ssh. But then you'd have to provision the certificates as part of the deployment, so I'm not sure that buys you any more ease of use in the end. Or does anybody have a better idea? -- Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist Aalto University School of Science, PHYS & NBE +358503841576 || janne.blomqv...@aalto.fi