Hi, as was already explained, the users/group DB must be available on all nodes. For your other question, "why there is munge?", munge provides a mechanism to solve the problem of a slurm daemon receiving a message claiming to be from UID=12345; how can it verify that this is true and not a forged message from another user? If you're interested in the details, suitable stuff to type into your favorite web search engine might be "SCM_CREDENTIALS" and "message authentication code" (MAC).
-- Janne Blomqvist ________________________________________ From: Anna Kostikova [[email protected]] Sent: Wednesday, November 05, 2014 21:31 To: slurm-dev Subject: [slurm-dev] linux users and slurm Dear list, Am I correct assuming that if I use munge for slurm, then only slurmd and munged should be running on all slurm nodes, and I can keep all unix users on another server, with, for instance, slurmctld running. For instance, I create a user with useradd, and when this user run a job in slurm, then, with a help of munge, node of slurm will recognise his uid and gid, even though this unix user is not created on this node of slurm. If yes, then why I might be having this error: [2014-11-05T10:24:30.729] launch task 602.0 request from 1007.1007@IP (port 22410) [2014-11-05T10:24:30.751] error: _send_slurmstepd_init getpwuid_r: No error [2014-11-05T10:24:30.751] error: Unable to init slurmstepd [2014-11-05T10:24:30.751] uid 1007 not found on system [2014-11-05T10:24:30.752] _step_setup: no job returned [2014-11-05T10:24:30.752] Unable to send "fail" to slurmd [2014-11-05T10:24:30.752] done with job Munge keys are exactly same on slurm server and all nodes. From the description here (http://linux.die.net/man/7/munge) it looks like all slurm users are kept in one place. But if not, then why there is munge? Or a user with specific uid and gid must exist on a node? Thanks a lot for your help, Anna
