[slurm-dev] Re: Cores, CPUs, and threads: take 2

2017-09-13 Thread Christopher Samuel

On 14/09/17 11:07, Lachlan Musicman wrote:

> Node configuration differs from hardware: CPUs=8:8(hw) Boards=1:1(hw)
> SocketsPerBoard=8:1(hw) CoresPerSocket=1:4(hw) ThreadsPerCore=1:2(hw)

OK, so this is saying that Slurm is seeing:

8 CPUs
1 board
1 socket per board
4 cores per socket
2 threads per core

which is what lscpu also describes the node as

Whereas the config that it thinks it should have is:

8 CPUs
1 board
8 sockets per board
1 core per socket
1 thread per core

which to me looks like what you would expect with just CPUS=8 in the
config and nothing else.

I guess a couple of questions:

1) Have you restarted slurmctld and slurmd everywhere?

2) Can you confirm that slurm.conf is the same everywhere?

3) what does slurmd -C report?

cheers!
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: On the need for slurm uid/gid consistency

2017-09-13 Thread Douglas Jacobsen
I would suggest it is a more general requirement, not simply enforced by
use of munge, which does imply a unified uid trust level across all nodes
using the same preshared key, but also when jobs are started, they are
started with a particular uid and other credentials (transmitted in the
slurm RPCs) for the intended user.  If there are different uid/gid values
across the system, this could prove to be problematic in many different
ways.

Given that, I would almost suggest that from a package maintainer
perspective, you should avoid creating the slurm user, and leave it for the
site to solve in whatever makes most sense for them.
-Doug

On Sep 13, 2017 17:12, "Christopher Samuel"  wrote:

>
> On 13/09/17 04:53, Phil K wrote:
>
> > I'm hoping someone can provide an explanation as to why slurm
> > requires uid/gid consistency across nodes, with emphasis on the need
> > for the 'SlurmUser' to be uid/gid-consistent.
>
> I think this is a consequence of the use of Munge, rather than being
> inherent in Slurm itself.
>
> https://dun.github.io/munge/
>
> # It allows a process to authenticate the UID and GID of another
> # local or remote process within a group of hosts having common
> # users and groups
>
> Gory details are in the munged(8) manual page:
>
> https://github.com/dun/munge/wiki/Man-8-munged
>
> But I think the core of the matter is:
>
> # When a credential is validated, munged first checks the
> # message authentication code to ensure the credential has
> # not been subsequently altered. Next, it checks the embedded
> # UID/GID restrictions to determine whether the requesting
> # client is allowed to decode it.
>
> So if the UID's & GID's of the user differ across systems then it
> appears it will not allow the receiver to validate the message.
>
> cheers,
> Chris
> --
>  Christopher SamuelSenior Systems Administrator
>  Melbourne Bioinformatics - The University of Melbourne
>  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
>


[slurm-dev] Re: Cores, CPUs, and threads: take 2

2017-09-13 Thread Lachlan Musicman
On 14 September 2017 at 11:10, Christopher Samuel 
wrote:

>
> On 14/09/17 11:07, Lachlan Musicman wrote:
>
> > Node configuration differs from hardware: CPUs=8:8(hw) Boards=1:1(hw)
> > SocketsPerBoard=8:1(hw) CoresPerSocket=1:4(hw) ThreadsPerCore=1:2(hw)
>
> Hmm, are you virtualised by some chance?
>
> If so it might be that the VM layer is lying to the guest about the
> actual hardware layout.
>
> What does "lscpu" say?
>

No, not virtualised.

[root@papr-res-compute34 slurm]# lscpu
Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):8
On-line CPU(s) list:   0-7
Thread(s) per core:2
Core(s) per socket:4
Socket(s): 1
NUMA node(s):  1
Vendor ID: GenuineIntel
CPU family:6
Model: 60
Model name:Intel(R) Xeon(R) CPU E3-1275L v3 @ 2.70GHz
Stepping:  3
CPU MHz:   2699.791
CPU max MHz:   3900.
CPU min MHz:   800.
BogoMIPS:  5399.58
Virtualization:VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache:  256K
L3 cache:  8192K
NUMA node0 CPU(s): 0-7
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall
nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl
xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl
vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic
movbe popcnt aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb
tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep
bmi2 erms invpcid xsaveopt dtherm ida arat pln pts


[slurm-dev] Re: Cores, CPUs, and threads: take 2

2017-09-13 Thread Christopher Samuel

On 14/09/17 11:07, Lachlan Musicman wrote:

> Node configuration differs from hardware: CPUs=8:8(hw) Boards=1:1(hw)
> SocketsPerBoard=8:1(hw) CoresPerSocket=1:4(hw) ThreadsPerCore=1:2(hw)

Hmm, are you virtualised by some chance?

If so it might be that the VM layer is lying to the guest about the
actual hardware layout.

What does "lscpu" say?

cheers,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: On the need for slurm uid/gid consistency

2017-09-13 Thread Christopher Samuel

On 13/09/17 04:53, Phil K wrote:

> I'm hoping someone can provide an explanation as to why slurm
> requires uid/gid consistency across nodes, with emphasis on the need
> for the 'SlurmUser' to be uid/gid-consistent.

I think this is a consequence of the use of Munge, rather than being
inherent in Slurm itself.

https://dun.github.io/munge/

# It allows a process to authenticate the UID and GID of another
# local or remote process within a group of hosts having common
# users and groups

Gory details are in the munged(8) manual page:

https://github.com/dun/munge/wiki/Man-8-munged

But I think the core of the matter is:

# When a credential is validated, munged first checks the
# message authentication code to ensure the credential has
# not been subsequently altered. Next, it checks the embedded
# UID/GID restrictions to determine whether the requesting
# client is allowed to decode it.

So if the UID's & GID's of the user differ across systems then it
appears it will not allow the receiver to validate the message.

cheers,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545


[slurm-dev] Re: On the need for slurm uid/gid consistency

2017-09-13 Thread Philip Kovacs
I'm not trying to solve any problems, I'm just trying to understand the issue 
well enough so that I can make informed decisions as a downstream package 
maintainer.  With the need for uid/gid consistency, we either need to pass on 
the issue, i.e. package for use by root, or we need to do difficult things like 
persuade the packaging committee to reserve a uid/gid solely for the use by 
this package.  I can tell you that the latter is a difficult sell.  
Furthermore, reserving a distro-wide uid/gid may be the wrong thing to do given 
that many, if not most people will want to manage users with ldap-type tools 
anyway.  
My thinking at this point is to go with a root deployment and make no attempt 
to introduce uids that may not be wanted.  It would be useful to have a script 
which takes a username as $1 and performs all the needed actions to convert a 
node to that SlurmUser, so that people who install and start the daemons as 
root have a path to convert the installation to some other user. 

On Wednesday, September 13, 2017 4:09 AM, Janne Blomqvist 
 wrote:
 

 
On 2017-09-12 21:52, Phil K wrote:
> I'm hoping someone can provide an explanation as to why slurm requires
> uid/gid consistency
> across nodes, with emphasis on the need for the 'SlurmUser' to be
> uid/gid-consistent.  I know
> that slurmctld and slurmdbd can run as user `slurm` and that this would
> be safer than running
> as root.  slurmd must run as root in any case, to my knowledge.  Is the
> need for uid consistency,
> esp with the SlurmUser a difficult barrier to overcome?  Please clarify
> for me.  Thanks.  Phil
>
>
Yes, this is tedious. Either you need to create the slurm user with a 
consistent uid/gid when provisioning a node, or then ldap/nis/whatever needs to 
be up and running before you start any slurm daemons. It would be nicer if the 
rpm's could just create a slurm user when installing the packages for 
slurmctld/slurmdbd, and let the system allocate the uid/gid so it doesn't 
conflict with any other local uid/gid's but not having to ensure the slurm 
uid/gid is globally unique.

Anyway, I think the reason behind this is that slurmd needs to ensure that 
control messages coming from slurmctld really come from the slurmctld daemon, 
and not from some random unprivileged process. And as munge is needed anyway to 
ensure that end user uid/gid's are correct, it's also used to ensure that the 
control messages really come from the SlurmUser.

I guess in principle you could get rid of the requirement for a SlurmUser with 
consistent uid/gid, say by using certificates like TLS or ssh. But then you'd 
have to provision the certificates as part of the deployment, so I'm not sure 
that buys you any more ease of use in the end. Or does anybody have a better 
idea?

-- 
Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
Aalto University School of Science, PHYS & NBE
+358503841576 || janne.blomqv...@aalto.fi


   

[slurm-dev] Re: On the need for slurm uid/gid consistency

2017-09-13 Thread Janne Blomqvist


On 2017-09-12 21:52, Phil K wrote:

I'm hoping someone can provide an explanation as to why slurm requires
uid/gid consistency
across nodes, with emphasis on the need for the 'SlurmUser' to be
uid/gid-consistent.   I know
that slurmctld and slurmdbd can run as user `slurm` and that this would
be safer than running
as root.  slurmd must run as root in any case, to my knowledge.  Is the
need for uid consistency,
esp with the SlurmUser a difficult barrier to overcome?  Please clarify
for me.  Thanks.  Phil



Yes, this is tedious. Either you need to create the slurm user with a 
consistent uid/gid when provisioning a node, or then ldap/nis/whatever needs to 
be up and running before you start any slurm daemons. It would be nicer if the 
rpm's could just create a slurm user when installing the packages for 
slurmctld/slurmdbd, and let the system allocate the uid/gid so it doesn't 
conflict with any other local uid/gid's but not having to ensure the slurm 
uid/gid is globally unique.

Anyway, I think the reason behind this is that slurmd needs to ensure that 
control messages coming from slurmctld really come from the slurmctld daemon, 
and not from some random unprivileged process. And as munge is needed anyway to 
ensure that end user uid/gid's are correct, it's also used to ensure that the 
control messages really come from the SlurmUser.

I guess in principle you could get rid of the requirement for a SlurmUser with 
consistent uid/gid, say by using certificates like TLS or ssh. But then you'd 
have to provision the certificates as part of the deployment, so I'm not sure 
that buys you any more ease of use in the end. Or does anybody have a better 
idea?

--
Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
Aalto University School of Science, PHYS & NBE
+358503841576 || janne.blomqv...@aalto.fi