Husen R wrote on 7/18/16 1:18 AM:
On Mon, Jul 18, 2016 at 5:52 AM, P. Larry Nelson <lnel...@illinois.edu <mailto:lnel...@illinois.edu>> wrote: Hello, While I am in search of real hardware on which to build/test Slurm, I am attempting to just play around with it on a test VM (Scientific Linux 6.8), which, of course, is using NATted networking and is a standalone system protected from the outside world. I downloaded the latest (16.05.2) tarball and ran the rpmbuild and then installed all the rpm's. Ran the Easy Configurator and gave it the hostname of the VM for the ControlMachine and the loopback address of 127.0.0.1 for the ControlAddr. Try to fill ControlMachine and ControlAddr with the same value (ex. hostname)
Ok, the man page clarified that. At first glance, I assumed that ControlMachine was the short hostname and ControlAddr was an IP address. Better labeling of those two parameters would be less confusing, IMHO.
Made a munge key and it started just fine. When I do a 'service slurm start', it responds "OK" for both slurmctld and slurmd, but slurmctld dies right away. If I do a 'slurmctld -Dvvv', I get: slurmctld: pidfile not locked, assuming no running daemon slurmctld: debug: creating clustername file: /var/spool/clustername slurmctld: fatal: _create_clustername_file: failed to create file /var/spool/clustername The slurm.conf has this for ClusterName: ClusterName=SlurmCluster So, why is slurmctld trying to create file /var/spool/clustername instead of /var/spool/SlurmCluster. clustername is just a filename. You can see your ClusterName in that file.
Actually, it's /var/spool/slurmctld/clustername and yes, that file is just a text file with the cluster name I provided in slurm.conf - only it reduced it to all lowercase - i.e. from SlurmCluster to slurmcluster. Whatever... The error I was getting (failed to create file /var/spool/clustername) was because SlurmUser (slurm) does not have permissions to write to /var/spool. When I changed SlurmUser to root, then it worked and I finally got slurmctld to at least start and keep running and it created /var/spool/slurmctld and /var/spool/slurmd. Then I stopped slurmd and slurmctld, went to /var/spool and did a 'chown -R slurm: slurm*' to make it all owned by slurm. Then change SlurmUser in slurm.conf back to slurm from root and restarted slurmd and slurmctld. Seems to be happy now. I am annotating my notes to indicate that one probably needs to mkdir slurmd and slurmctld in /var/spool first and then do the chown recursively to one's SlurmUser name before starting slurmd and slurmctld. - Larry [snip...] -- P. Larry Nelson (217-244-9855) | IT Administrator 457 Loomis Lab | High Energy Physics Group 1110 W. Green St., Urbana, IL | Physics Dept., Univ. of Ill. MailTo: lnel...@illinois.edu | http://hep.physics.illinois.edu/home/lnelson/ ------------------------------------------------------------------------------ "Information without accountability is just noise." - P.L. Nelson