I have created a slurm user on all nodes:

$ groupadd -777 slurm
$ useradd -g 777 -u 777 slurm

...and specified that this user is the SlurmUser in slurm.conf.  I'm using
munge for authentication and that all works fine.

I'm still confused though about which uid is actually executing tasks that
are submitted.  Some jobs I'm submitting call out to java.  The
installation of java is not in the same place on every node.

For example, on some machines it is in /opt/java, and others it is
/usr/java/jdk1.6.0_19

If I have two nodes:  MASTER and NODE1 where MASTER is the master
controller, and I submit a job on MASTER:

[root@MASTER ~]# srun -n1 -w NODE1 which java
/usr/bin/which: no java in
(/usr/kerberos/sbin:/usr/kerberos/bin:/usr/java/jdk1.6.0_19/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin)


[root@MASTER ~]# srun -n1 -w MASTER which java
/usr/java/jdk1.6.0_19/bin/java

Now, if I ssh to NODE1 and do the same:

[root@NODE1 ~]# srun -n1 -w MASTER which java
/usr/bin/which: no java in
(/usr/kerberos/sbin:/usr/kerberos/bin:/usr/java/jdk1.6.0_19/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin)

[root@NODE1 ~]# srun -n1 -w NODE1 which java
/opt/java/jdk1.6.0_19/bin/java

This is confusing me.  I edited /etc/profile on NODE1 to add JAVA_HOME and
add the binary to the PATH.  If I ssh to NODE1 as user slurm, it can find
java just fine.  But when a task is submitted to NODE1, it doesn't seem to
execute any kind of profile script, but inherits the environment variables
of the host that the job was submitted from.  Is this true?

Reply via email to