Hi,

I've run into the same issue with slurm-15.08.3, OS: RHEL 6.5 x64.

slurmctld is reading the SlurmUser setting and starts as user slurm, however slurmd doesn't respect the SlurmdUser config.

If SlurmdUser is commented out, slurmd starts as user root (in accordance with documentation) - confirmed by looking at ps -ef | grep slurmd If SlurmdUser is configured, it will only start if started directly by the user i.e.
su - slurm
service slurm start

Otherwise the error message in slurmd.log is the same as noted below:

fatal: You are running slurmd as something other than user slurm(###). If you want to run as this user add SlurmdUser=root to the slurm.conf file.
...


slurmctld works correctly - it always starts as whatever user is set via SlurmUser. slurmd never respects the SlurmdUser config unless started by that user (tested using several users).

I remember on previous (~2.x.x) versions of slurm this worked correctly, hadn't had a chance to test this on any of the newer (14.x , 15.x) versions before.

On 11/11/2015 12:32 AM, James Oguya wrote:
Re: [slurm-dev] Help: SLURM will not start on either nodes after setup.
Based on your slurmd logs—from the excerpt—slurmd is failing because it's not running as slurm user. In your config file, set SlurmUser=slurm and comment out the SlurmdUser=slurm line.

Otherwise, for further troubleshooting, please attach your slurmctld(from the head node) and slurmdbd log files.

On Thu, Nov 5, 2015 at 12:08 AM, Dennis Mungai <[email protected] <mailto:[email protected]>> wrote:

    Hello there,

    We recently deployed SLURM for a Bioinformatics cluster at
    KEMRI-Wellcome Trust, Kilifi, kenya, and after following the setup
    guide and the online configurator ( to build the configuration
    file), here are the errors we ran ino:

    1.None of the slurmd daemons on either node will start up.

    2.Apparently, slurmdbd starts up correctly and allowed us to
    register the cluster.

    Here’s the debug information available at the moment:

    1.1. An excerpt from the logs:

    less /var/log/slurm/slurmd.log | tail

    [2015-11-04T22:33:01.629] fatal: You are running slurmd as
    something other than user slurm(564).  If you want to run as this
    user add SlurmdUser=root to the slurm.conf file.

    [2015-11-04T22:36:22.663] Node configuration differs from
    hardware: CPUs=64:64(hw) Boards=1:1(hw) SocketsPerBoard=64:4(hw)
    CoresPerSocket=1:8(hw) ThreadsPerCore=1:2(hw)

    [2015-11-04T22:36:22.663] Message aggregation disabled

    [2015-11-04T22:36:22.664] Resource spec: Reserved system memory
    limit not configured for this node

    [2015-11-04T23:00:17.659] Slurmd shutdown completing

    [2015-11-04T23:05:38.092] Node configuration differs from
    hardware: CPUs=64:64(hw) Boards=1:1(hw) SocketsPerBoard=64:4(hw)
    CoresPerSocket=1:8(hw) ThreadsPerCore=1:2(hw)

    [2015-11-04T23:05:38.098] Message aggregation disabled

    [2015-11-04T23:05:38.111] error: _cpu_freq_cpu_avail: Could not
    open
    /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies

    [2015-11-04T23:05:38.113] Resource spec: Reserved system memory
    limit not configured for this node

    [2015-11-04T23:05:38.127] fatal: You are running slurmd as
    something other than user slurm(564).  If you want to run as this
    user add SlurmdUser=root to the slurm.conf file.

    The same message appears on the other three nodes as well.

    scontrol ping returns:

    Slurmctld(primary/backup) at kenbo-cen05/(NULL) are UP/DOWN

    Sinfo returns:

    PARTITION AVAIL  TIMELIMIT  NODES STATE NODELIST

    debug*       up       5:00      1 down* kenbo-cen05

    highmem      up   infinite      4 down* kenbo-cen[05-08]

    batch        up   infinite      4 down* kenbo-cen[05-08]

    longrun      up   infinite      4 down* kenbo-cen[05-08]

    My configuration file and the init.d scripts for both slurm and
    slurmdbd are attached below for your perusal.

    Your assistance will be highly appreciated.

    Regards,

    Dennis Mungai.


    ______________________________________________________________________

    This e-mail contains information which is confidential. It is
    intended only for the use of the named recipient. If you have
    received this e-mail in error, please let us know by replying to
    the sender, and immediately delete it from your system. Please
    note, that in these circumstances, the use, disclosure,
    distribution or copying of this information is strictly
    prohibited. KEMRI-Wellcome Trust Programme cannot accept any
    responsibility for the accuracy or completeness of this message as
    it has been transmitted over a public network. Although the
    Programme has taken reasonable precautions to ensure no viruses
    are present in emails, it cannot accept responsibility for any
    loss or damage arising from the use of the email or attachments.
    Any views expressed in this message are those of the individual
    sender, except where the sender specifically states them to be the
    views of KEMRI-Wellcome Trust Programme.
    ______________________________________________________________________




--
/James Oguya/

Reply via email to