Husen R wrote on 7/18/16 1:18 AM:


On Mon, Jul 18, 2016 at 5:52 AM, P. Larry Nelson <lnel...@illinois.edu
<mailto:lnel...@illinois.edu>> wrote:


    Hello,

    While I am in search of real hardware on which to build/test Slurm,
    I am attempting to just play around with it on a test VM (Scientific
    Linux 6.8), which, of course, is using NATted networking and is a
    standalone system protected from the outside world.

    I downloaded the latest (16.05.2) tarball and ran the rpmbuild
    and then installed all the rpm's.  Ran the Easy Configurator
    and gave it the hostname of the VM for the ControlMachine
    and the loopback address of 127.0.0.1 for the ControlAddr.


Try to fill ControlMachine and ControlAddr with the same value (ex. hostname)

Ok, the man page clarified that.  At first glance, I assumed that ControlMachine
was the short hostname and ControlAddr was an IP address.   Better labeling of
those two parameters would be less confusing, IMHO.

    Made a munge key and it started just fine.

    When I do a 'service slurm start', it responds "OK" for both slurmctld
    and slurmd, but slurmctld dies right away.

    If I do a 'slurmctld -Dvvv', I get:
    slurmctld: pidfile not locked, assuming no running daemon
    slurmctld: debug:  creating clustername file: /var/spool/clustername
    slurmctld: fatal: _create_clustername_file: failed to create file
    /var/spool/clustername

    The slurm.conf has this for ClusterName:
    ClusterName=SlurmCluster

    So, why is slurmctld trying to create file /var/spool/clustername
    instead of /var/spool/SlurmCluster.


clustername is just a filename. You can see your ClusterName in that file.

Actually, it's /var/spool/slurmctld/clustername and yes, that file is just
a text file with the cluster name I provided in slurm.conf - only it reduced
it to all lowercase - i.e. from SlurmCluster to slurmcluster.  Whatever...

The error I was getting (failed to create file /var/spool/clustername) was
because SlurmUser (slurm) does not have permissions to write to /var/spool.
When I changed SlurmUser to root, then it worked and I finally got slurmctld
to at least start and keep running and it created /var/spool/slurmctld and
/var/spool/slurmd.

Then I stopped slurmd and slurmctld, went to /var/spool and did a
'chown -R slurm: slurm*' to make it all owned by slurm.
Then change SlurmUser in slurm.conf back to slurm from root and restarted slurmd
and slurmctld.  Seems to be happy now.

I am annotating my notes to indicate that one probably needs to mkdir
slurmd and slurmctld in /var/spool first and then do the chown recursively
to one's SlurmUser name before starting slurmd and slurmctld.

- Larry

        [snip...]


--
P. Larry Nelson (217-244-9855) | IT Administrator
457 Loomis Lab                 | High Energy Physics Group
1110 W. Green St., Urbana, IL  | Physics Dept., Univ. of Ill.
MailTo: lnel...@illinois.edu   | http://hep.physics.illinois.edu/home/lnelson/
------------------------------------------------------------------------------
 "Information without accountability is just noise."  - P.L. Nelson

Reply via email to