We're testing slurm 2.3.x and accidentally discovered the following two things about SlurmUser and file permissions.
We have set up SlurmUser=slurm in both slurm.conf and slurmdbd.conf: # grep -i slurmuser /etc/slurm/slurm*.conf /etc/slurm/slurm.conf:SlurmUser=slurm /etc/slurm/slurmdbd.conf:SlurmUser=slurm The user slurm is not able to read a root-only-readable file: # ls -l /etc/slurm/slurmdbd.conf -r-------- 1 root root 1178 Jan 31 15:28 /etc/slurm/slurmdbd.conf # sudo -u slurm cat /etc/slurm/slurmdbd.conf cat: /etc/slurm/slurmdbd.conf: Permission denied I'll show the details for 2.2.6, but both version 2.2.6, 2.3.2 and 2.3.3 (all we've tested) show the same behaviour. ** 1) According to man slurmdbd.conf, slurmdbd.conf must be readable by SlurmUser (i.e. slurm), but it seems slurmdbd will happily read a slurmdbd.conf that is only readable by root: # ls -l /etc/slurm/slurmdbd.conf -r-------- 1 root root 1178 Jan 31 15:28 /etc/slurm/slurmdbd.conf # service slurmdbd start starting slurmdbd: [ OK ] # ps aux|grep slurmdbd slurm 10205 0.0 0.0 61888 1960 ? Sl 15:37 0:00 /usr/sbin/slurmdbd root 10465 0.0 0.0 6016 568 pts/2 S+ 15:40 0:00 grep slurmdbd /var/log/slurmdbd.log says: [2012-01-31T15:37:22] debug3: Trying to load plugin /usr/lib64/slurm/auth_munge.so [2012-01-31T15:37:22] auth plugin for Munge (http://home.gna.org/munge/) loaded [2012-01-31T15:37:22] debug3: Success. [2012-01-31T15:37:22] debug3: Trying to load plugin /usr/lib64/slurm/accounting_storage_mysql.so [2012-01-31T15:37:22] debug2: mysql_connect() called for db slurmacct [2012-01-31T15:37:22] Accounting storage MYSQL plugin loaded [2012-01-31T15:37:22] debug3: Success. [2012-01-31T15:37:22] pidfile not locked, assuming no running daemon [2012-01-31T15:37:22] debug2: ArchiveDir = /state/partition1/slurm/archive [2012-01-31T15:37:22] debug2: ArchiveScript = (null) [2012-01-31T15:37:22] debug2: AuthInfo = (null) [2012-01-31T15:37:22] debug2: AuthType = auth/munge [2012-01-31T15:37:22] debug2: DbdAddr = blaster [2012-01-31T15:37:22] debug2: DbdBackupHost = (null) [2012-01-31T15:37:22] debug2: DbdHost = blaster [2012-01-31T15:37:22] debug2: DbdPort = 6819 [2012-01-31T15:37:22] debug2: DebugLevel = 7 [2012-01-31T15:37:22] debug2: DefaultQOS = (null) [2012-01-31T15:37:22] debug2: LogFile = /var/log/slurm/slurmdbd.log [2012-01-31T15:37:22] debug2: MessageTimeout = 10 [2012-01-31T15:37:22] debug2: PidFile = /var/run/slurmdbd.pid [2012-01-31T15:37:22] debug2: PluginDir = /usr/lib64/slurm [2012-01-31T15:37:22] debug2: PrivateData = none [2012-01-31T15:37:22] debug2: PurgeEventAfter = 6 months* [2012-01-31T15:37:22] debug2: PurgeJobAfter = 6 months* [2012-01-31T15:37:22] debug2: PurgeStepAfter = 6 months* [2012-01-31T15:37:22] debug2: PurgeSuspendAfter = 6 months* [2012-01-31T15:37:22] debug2: SlurmUser = slurm(401) [2012-01-31T15:37:22] debug2: StorageBackupHost = (null) [2012-01-31T15:37:22] debug2: StorageHost = localhost [2012-01-31T15:37:22] debug2: StorageLoc = slurmacct [2012-01-31T15:37:22] debug2: StoragePass = THESECRETPASSWORD [2012-01-31T15:37:22] debug2: StoragePort = 3306 [2012-01-31T15:37:22] debug2: StorageType = accounting_storage/mysql [2012-01-31T15:37:22] debug2: StorageUser = slurm [2012-01-31T15:37:22] debug2: TrackWCKey = 0 [2012-01-31T15:37:22] debug2: acct_storage_p_get_connection: request new connection 0 [2012-01-31T15:37:22] debug3: 0(as_mysql_qos.c:1057) query [...] [2012-01-31T15:37:22] slurmdbd version 2.2.6 started [2012-01-31T15:37:22] debug2: running rollup at Tue Jan 31 15:37:22 2012 [2012-01-31T15:37:22] debug2: No need to roll cluster titan this hour 1328018400 <= 1328018400 [2012-01-31T15:37:22] debug2: No need to roll cluster titan this day 1327964400 <= 1327964400 [2012-01-31T15:37:22] debug2: No need to roll cluster titan this month 1325372400 <= 1325372400 [2012-01-31T15:37:22] debug2: Everything rolled up ** 2) When slurmdbd or slurmctld starts, if the slurmdbd.log, slurmctld.log or sched.log files don't exist, they are created with owner root. slurmdbd is happy with this, but slurmctld fails: # ls -la /var/log/slurm total 5800 drwxrwxr-x 2 slurm root 4096 Jan 31 15:44 ./ drwxr-xr-x 17 root root 4096 Jan 31 04:02 ../ # service slurmdbd start starting slurmdbd: [ OK ] # ls -la /var/log/slurm total 5816 drwxrwxr-x 2 slurm root 4096 Jan 31 15:45 ./ drwxr-xr-x 17 root root 4096 Jan 31 04:02 ../ -rw------- 1 root root 14387 Jan 31 15:45 slurmdbd.log # ps aux|grep slurmdbd slurm 10787 0.2 0.0 60860 1856 ? Sl 15:45 0:00 /usr/sbin/slurmdbd root 10797 0.0 0.0 6016 560 pts/2 S+ 15:46 0:00 grep slurmdbd # service slurm start starting slurmctld: [ OK ] blaster 580(1)# ls -la /var/log/slurm/ total 5824 drwxrwxr-x 2 slurm root 4096 Jan 31 15:46 ./ drwxr-xr-x 17 root root 4096 Jan 31 04:02 ../ -rw------- 1 root root 48 Jan 31 15:46 sched.log -rw------- 1 root root 1205 Jan 31 15:46 slurmctld.log -rw------- 1 root root 15926 Jan 31 15:46 slurmdbd.log # ps aux|grep slurm slurm 10787 0.1 0.0 62248 2448 ? Sl 15:45 0:00 /usr/sbin/slurmdbd root 10833 0.0 0.0 6016 560 pts/2 S+ 15:47 0:00 grep slurm # tail /var/log/slurm/slurmctld.log [2012-01-31T15:46:48] preempt/qos loaded [2012-01-31T15:46:48] checkpoint/blcr init [2012-01-31T15:46:48] Checkpoint plugin loaded: checkpoint/blcr [2012-01-31T15:46:48] Job accounting gather LINUX plugin loaded [2012-01-31T15:46:48] job_submit.lua: initialized [2012-01-31T15:46:48] debug: No backup controller to shutdown [2012-01-31T15:46:48] switch NONE plugin loaded [2012-01-31T15:46:48] topology NONE plugin loaded [2012-01-31T15:46:48] debug: No DownNodes [2012-01-31T15:46:48] fatal: sched_log_alter could not open /var/log/slurm/sched.log: Permission denied Could it be that the daemons switch uid to SlurmUser too late? -- Regards, Bjørn-Helge Mevik, dr. scient, Research Computing Services, University of Oslo