Hi Thekla! :)
For me it looks like it's a configuration issue of the client LDAP name service on the compute nodes. Which service are you using? nslcd or sssd? I can see that you have change the default behavior/command of salloc and the command gives you a prompt on the compute node directly (by default salloc will return a shell on the login node where it was called). Check and be sure that you are not doing something wrong in the new salloc command that you defined in slurm.conf (SallocDefaultCommand option). Can you try to go as root on the compute nodes and try to resolve a uid with the id command? What does it give you there, all groups or some secondary groups are missing? If the secondary groups are missing then it's not a problem of Slurm but the config of the ID resolving service. As far as I know Slurm changes the environment after salloc (e.g. exports SLURM_ env vars) but shouldn't change the behavior of commands like id.. Best Regards, Chrysovalantis Paschoulas On 05/25/2016 10:32 AM, Thekla Loizou wrote:
Dear all, We have noticed a very strange problem every time we add an existing user to a secondary group. We manage our users in LDAP. When we add a user to a new group and then type the "id" and "groups" commands we see that the user was indeed added to the new group. The same happens when running the command "getent groups". For example, for a user "thekla" whose primary group was "cstrc" and now was also added to the group "build" we get: [thekla@node01 ~]$ id uid=2017(thekla) gid=5000(cstrc) groups=5000(cstrc),10257(build) [thekla@node01 ~]$ groups cstrc build [thekla@node01 ~]$ getent group | grep build build:*:10257:thekla The above output is the correct one and it is given to us when we ssh to one of the compute nodes. But, when we submit a job on the nodes (so getting access through SLURM and not with direct ssh), we cannot see the new group the user was added to: [thekla@prometheus ~]$ salloc -N1 salloc: Granted job allocation 8136 [thekla@node01 ~]$ id uid=2017(thekla) gid=5000(cstrc) groups=5000(cstrc) [thekla@node01 ~]$ groups cstrc While, the following output shows the correct result: [thekla@node01 ~]$ getent group | grep build build:*:10257:thekla This problem appears only when we get access through SLURM i.e. when we run a job. Has anyone faced this problem before? The only way we found for solving this is to restart the SLURM service on the compute nodes every time we add a user to a new group. Thanks, Thekla
------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------
