We're testing slurm 2.3.x and accidentally discovered the following two
things about SlurmUser and file permissions.

We have set up SlurmUser=slurm in both slurm.conf and slurmdbd.conf:

# grep -i slurmuser /etc/slurm/slurm*.conf
/etc/slurm/slurm.conf:SlurmUser=slurm
/etc/slurm/slurmdbd.conf:SlurmUser=slurm

The user slurm is not able to read a root-only-readable file:

# ls -l /etc/slurm/slurmdbd.conf
-r-------- 1 root root 1178 Jan 31 15:28 /etc/slurm/slurmdbd.conf
# sudo -u slurm cat /etc/slurm/slurmdbd.conf
cat: /etc/slurm/slurmdbd.conf: Permission denied

I'll show the details for 2.2.6, but both version 2.2.6, 2.3.2 and 2.3.3
(all we've tested) show the same behaviour.

** 1) According to man slurmdbd.conf, slurmdbd.conf must be readable by
SlurmUser (i.e. slurm), but it seems slurmdbd will happily read a
slurmdbd.conf that is only readable by root:

# ls -l /etc/slurm/slurmdbd.conf
-r-------- 1 root root 1178 Jan 31 15:28 /etc/slurm/slurmdbd.conf
# service slurmdbd start
starting slurmdbd:                                         [  OK  ]
# ps aux|grep slurmdbd
slurm    10205  0.0  0.0  61888  1960 ?        Sl   15:37   0:00 
/usr/sbin/slurmdbd
root     10465  0.0  0.0   6016   568 pts/2    S+   15:40   0:00 grep slurmdbd

/var/log/slurmdbd.log says:
[2012-01-31T15:37:22] debug3: Trying to load plugin 
/usr/lib64/slurm/auth_munge.so
[2012-01-31T15:37:22] auth plugin for Munge (http://home.gna.org/munge/) loaded
[2012-01-31T15:37:22] debug3: Success.
[2012-01-31T15:37:22] debug3: Trying to load plugin 
/usr/lib64/slurm/accounting_storage_mysql.so
[2012-01-31T15:37:22] debug2: mysql_connect() called for db slurmacct
[2012-01-31T15:37:22] Accounting storage MYSQL plugin loaded
[2012-01-31T15:37:22] debug3: Success.
[2012-01-31T15:37:22] pidfile not locked, assuming no running daemon
[2012-01-31T15:37:22] debug2: ArchiveDir        = 
/state/partition1/slurm/archive
[2012-01-31T15:37:22] debug2: ArchiveScript     = (null)
[2012-01-31T15:37:22] debug2: AuthInfo          = (null)
[2012-01-31T15:37:22] debug2: AuthType          = auth/munge
[2012-01-31T15:37:22] debug2: DbdAddr           = blaster
[2012-01-31T15:37:22] debug2: DbdBackupHost     = (null)
[2012-01-31T15:37:22] debug2: DbdHost           = blaster
[2012-01-31T15:37:22] debug2: DbdPort           = 6819
[2012-01-31T15:37:22] debug2: DebugLevel        = 7
[2012-01-31T15:37:22] debug2: DefaultQOS        = (null)
[2012-01-31T15:37:22] debug2: LogFile           = /var/log/slurm/slurmdbd.log
[2012-01-31T15:37:22] debug2: MessageTimeout    = 10
[2012-01-31T15:37:22] debug2: PidFile           = /var/run/slurmdbd.pid
[2012-01-31T15:37:22] debug2: PluginDir         = /usr/lib64/slurm
[2012-01-31T15:37:22] debug2: PrivateData       = none
[2012-01-31T15:37:22] debug2: PurgeEventAfter   = 6 months*
[2012-01-31T15:37:22] debug2: PurgeJobAfter     = 6 months*
[2012-01-31T15:37:22] debug2: PurgeStepAfter    = 6 months*
[2012-01-31T15:37:22] debug2: PurgeSuspendAfter     = 6 months*
[2012-01-31T15:37:22] debug2: SlurmUser         = slurm(401)
[2012-01-31T15:37:22] debug2: StorageBackupHost = (null)
[2012-01-31T15:37:22] debug2: StorageHost       = localhost
[2012-01-31T15:37:22] debug2: StorageLoc        = slurmacct
[2012-01-31T15:37:22] debug2: StoragePass       = THESECRETPASSWORD
[2012-01-31T15:37:22] debug2: StoragePort       = 3306
[2012-01-31T15:37:22] debug2: StorageType       = accounting_storage/mysql
[2012-01-31T15:37:22] debug2: StorageUser       = slurm
[2012-01-31T15:37:22] debug2: TrackWCKey        = 0
[2012-01-31T15:37:22] debug2: acct_storage_p_get_connection: request new 
connection 0
[2012-01-31T15:37:22] debug3: 0(as_mysql_qos.c:1057) query
[...]
[2012-01-31T15:37:22] slurmdbd version 2.2.6 started
[2012-01-31T15:37:22] debug2: running rollup at Tue Jan 31 15:37:22 2012

[2012-01-31T15:37:22] debug2: No need to roll cluster titan this hour 
1328018400 <= 1328018400
[2012-01-31T15:37:22] debug2: No need to roll cluster titan this day 1327964400 
<= 1327964400
[2012-01-31T15:37:22] debug2: No need to roll cluster titan this month 
1325372400 <= 1325372400
[2012-01-31T15:37:22] debug2: Everything rolled up



** 2) When slurmdbd or slurmctld starts, if the slurmdbd.log,
slurmctld.log or sched.log files don't exist, they are created with
owner root.  slurmdbd is happy with this, but slurmctld fails:

# ls -la /var/log/slurm 
total 5800
drwxrwxr-x  2 slurm root    4096 Jan 31 15:44 ./
drwxr-xr-x 17 root  root    4096 Jan 31 04:02 ../
# service slurmdbd start
starting slurmdbd:                                         [  OK  ]
# ls -la /var/log/slurm
total 5816
drwxrwxr-x  2 slurm root    4096 Jan 31 15:45 ./
drwxr-xr-x 17 root  root    4096 Jan 31 04:02 ../
-rw-------  1 root  root   14387 Jan 31 15:45 slurmdbd.log
# ps aux|grep slurmdbd
slurm    10787  0.2  0.0  60860  1856 ?        Sl   15:45   0:00 
/usr/sbin/slurmdbd
root     10797  0.0  0.0   6016   560 pts/2    S+   15:46   0:00 grep slurmdbd
# service slurm start
starting slurmctld:                                        [  OK  ]
blaster 580(1)# ls -la /var/log/slurm/
total 5824
drwxrwxr-x  2 slurm root    4096 Jan 31 15:46 ./
drwxr-xr-x 17 root  root    4096 Jan 31 04:02 ../
-rw-------  1 root  root      48 Jan 31 15:46 sched.log
-rw-------  1 root  root    1205 Jan 31 15:46 slurmctld.log
-rw-------  1 root  root   15926 Jan 31 15:46 slurmdbd.log
# ps aux|grep slurm
slurm    10787  0.1  0.0  62248  2448 ?        Sl   15:45   0:00 
/usr/sbin/slurmdbd
root     10833  0.0  0.0   6016   560 pts/2    S+   15:47   0:00 grep slurm
# tail /var/log/slurm/slurmctld.log
[2012-01-31T15:46:48] preempt/qos loaded
[2012-01-31T15:46:48] checkpoint/blcr init
[2012-01-31T15:46:48] Checkpoint plugin loaded: checkpoint/blcr
[2012-01-31T15:46:48] Job accounting gather LINUX plugin loaded
[2012-01-31T15:46:48] job_submit.lua: initialized
[2012-01-31T15:46:48] debug:  No backup controller to shutdown
[2012-01-31T15:46:48] switch NONE plugin loaded
[2012-01-31T15:46:48] topology NONE plugin loaded
[2012-01-31T15:46:48] debug:  No DownNodes
[2012-01-31T15:46:48] fatal: sched_log_alter could not open 
/var/log/slurm/sched.log: Permission denied


Could it be that the daemons switch uid to SlurmUser too late?


-- 
Regards,
Bjørn-Helge Mevik, dr. scient,
Research Computing Services, University of Oslo

Reply via email to