Hi Valanti! :)
We are using nslcd on the compute nodes.
We have indeed changed the default behavior/command of salloc but I
don't think that this is the issue because the same happens when we
submit jobs via sbatch. So I believe that this is not related to the new
command we are using.
When logging in as root or as user on the compute nodes via ssh we get
all groups after running the "id" command,
but when logging in through a SLURM job (interactive with salloc or not
interactive with sbatch) we face the problem I described.
We have also checked the environment of the user in both cases (ssh or
SLURM) and the only differences are the SLURM environment variables and
nothing else.
Thanks,
Thekla
On 25/05/2016 02:07 μμ, Chrysovalantis Paschoulas wrote:
Hi Thekla! :)
For me it looks like it's a configuration issue of the client LDAP name
service on the compute nodes. Which service are you using? nslcd or
sssd? I can see that you have change the default behavior/command of
salloc and the command gives you a prompt on the compute node directly
(by default salloc will return a shell on the login node where it was
called). Check and be sure that you are not doing something wrong in the
new salloc command that you defined in slurm.conf (SallocDefaultCommand
option).
Can you try to go as root on the compute nodes and try to resolve a uid
with the id command? What does it give you there, all groups or some
secondary groups are missing? If the secondary groups are missing then
it's not a problem of Slurm but the config of the ID resolving service.
As far as I know Slurm changes the environment after salloc (e.g.
exports SLURM_ env vars) but shouldn't change the behavior of commands
like id..
Best Regards,
Chrysovalantis Paschoulas
On 05/25/2016 10:32 AM, Thekla Loizou wrote:
Dear all,
We have noticed a very strange problem every time we add an existing
user to a secondary group.
We manage our users in LDAP. When we add a user to a new group and
then type the "id" and "groups" commands we see that the user was
indeed added to the new group. The same happens when running the
command "getent groups".
For example, for a user "thekla" whose primary group was "cstrc" and
now was also added to the group "build" we get:
[thekla@node01 ~]$ id
uid=2017(thekla) gid=5000(cstrc) groups=5000(cstrc),10257(build)
[thekla@node01 ~]$ groups
cstrc build
[thekla@node01 ~]$ getent group | grep build
build:*:10257:thekla
The above output is the correct one and it is given to us when we ssh
to one of the compute nodes.
But, when we submit a job on the nodes (so getting access through
SLURM and not with direct ssh), we cannot see the new group the user
was added to:
[thekla@prometheus ~]$ salloc -N1
salloc: Granted job allocation 8136
[thekla@node01 ~]$ id
uid=2017(thekla) gid=5000(cstrc) groups=5000(cstrc)
[thekla@node01 ~]$ groups
cstrc
While, the following output shows the correct result:
[thekla@node01 ~]$ getent group | grep build
build:*:10257:thekla
This problem appears only when we get access through SLURM i.e. when
we run a job.
Has anyone faced this problem before? The only way we found for
solving this is to restart the SLURM service on the compute nodes
every time we add a user to a new group.
Thanks,
Thekla
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------