Hi,
I've run into the same issue with slurm-15.08.3, OS: RHEL 6.5 x64.
slurmctld is reading the SlurmUser setting and starts as user slurm,
however slurmd doesn't respect the SlurmdUser config.
If SlurmdUser is commented out, slurmd starts as user root (in
accordance with documentation) - confirmed by looking at ps -ef | grep
slurmd
If SlurmdUser is configured, it will only start if started directly by
the user i.e.
su - slurm
service slurm start
Otherwise the error message in slurmd.log is the same as noted below:
fatal: You are running slurmd as something other than user slurm(###).
If you want to run as this user add SlurmdUser=root to the slurm.conf file.
...
slurmctld works correctly - it always starts as whatever user is set via
SlurmUser.
slurmd never respects the SlurmdUser config unless started by that user
(tested using several users).
I remember on previous (~2.x.x) versions of slurm this worked correctly,
hadn't had a chance to test this on any of the newer (14.x , 15.x)
versions before.
On 11/11/2015 12:32 AM, James Oguya wrote:
Re: [slurm-dev] Help: SLURM will not start on either nodes after setup.
Based on your slurmd logs—from the excerpt—slurmd is failing because
it's not running as slurm user.
In your config file, set SlurmUser=slurm and comment out the
SlurmdUser=slurm line.
Otherwise, for further troubleshooting, please attach your
slurmctld(from the head node) and slurmdbd log files.
On Thu, Nov 5, 2015 at 12:08 AM, Dennis Mungai
<[email protected] <mailto:[email protected]>> wrote:
Hello there,
We recently deployed SLURM for a Bioinformatics cluster at
KEMRI-Wellcome Trust, Kilifi, kenya, and after following the setup
guide and the online configurator ( to build the configuration
file), here are the errors we ran ino:
1.None of the slurmd daemons on either node will start up.
2.Apparently, slurmdbd starts up correctly and allowed us to
register the cluster.
Here’s the debug information available at the moment:
1.1. An excerpt from the logs:
less /var/log/slurm/slurmd.log | tail
[2015-11-04T22:33:01.629] fatal: You are running slurmd as
something other than user slurm(564). If you want to run as this
user add SlurmdUser=root to the slurm.conf file.
[2015-11-04T22:36:22.663] Node configuration differs from
hardware: CPUs=64:64(hw) Boards=1:1(hw) SocketsPerBoard=64:4(hw)
CoresPerSocket=1:8(hw) ThreadsPerCore=1:2(hw)
[2015-11-04T22:36:22.663] Message aggregation disabled
[2015-11-04T22:36:22.664] Resource spec: Reserved system memory
limit not configured for this node
[2015-11-04T23:00:17.659] Slurmd shutdown completing
[2015-11-04T23:05:38.092] Node configuration differs from
hardware: CPUs=64:64(hw) Boards=1:1(hw) SocketsPerBoard=64:4(hw)
CoresPerSocket=1:8(hw) ThreadsPerCore=1:2(hw)
[2015-11-04T23:05:38.098] Message aggregation disabled
[2015-11-04T23:05:38.111] error: _cpu_freq_cpu_avail: Could not
open
/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
[2015-11-04T23:05:38.113] Resource spec: Reserved system memory
limit not configured for this node
[2015-11-04T23:05:38.127] fatal: You are running slurmd as
something other than user slurm(564). If you want to run as this
user add SlurmdUser=root to the slurm.conf file.
The same message appears on the other three nodes as well.
scontrol ping returns:
Slurmctld(primary/backup) at kenbo-cen05/(NULL) are UP/DOWN
Sinfo returns:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug* up 5:00 1 down* kenbo-cen05
highmem up infinite 4 down* kenbo-cen[05-08]
batch up infinite 4 down* kenbo-cen[05-08]
longrun up infinite 4 down* kenbo-cen[05-08]
My configuration file and the init.d scripts for both slurm and
slurmdbd are attached below for your perusal.
Your assistance will be highly appreciated.
Regards,
Dennis Mungai.
______________________________________________________________________
This e-mail contains information which is confidential. It is
intended only for the use of the named recipient. If you have
received this e-mail in error, please let us know by replying to
the sender, and immediately delete it from your system. Please
note, that in these circumstances, the use, disclosure,
distribution or copying of this information is strictly
prohibited. KEMRI-Wellcome Trust Programme cannot accept any
responsibility for the accuracy or completeness of this message as
it has been transmitted over a public network. Although the
Programme has taken reasonable precautions to ensure no viruses
are present in emails, it cannot accept responsibility for any
loss or damage arising from the use of the email or attachments.
Any views expressed in this message are those of the individual
sender, except where the sender specifically states them to be the
views of KEMRI-Wellcome Trust Programme.
______________________________________________________________________
--
/James Oguya/