Are both slurmdbd and slurmctld running as the same UID? (if not they need to be, I believe you can see the errors on slurmdbd debug2 or debug3)
---- Doug Jacobsen, Ph.D. NERSC Computer Systems Engineer National Energy Research Scientific Computing Center <http://www.nersc.gov> [email protected] ------------- __o ---------- _ '\<,_ ----------(_)/ (_)__________________________ On Wed, Mar 30, 2016 at 5:32 PM, Terri Knight <[email protected]> wrote: > > I posted earlier (Dec 28, 2015) about this issue and was told to check > that the slurmdbd and slurmctl daemons were running as the same user- they > weren't at that time. I thought making that change would resolve the > problem but it did not. > > These daemons are now both running as root > root 6463 1 0 17:01 ? 00:00:00 > /share/apps/slurm-15.08.8/sbin/slurmdbd > root 6743 1 0 17:05 ? 00:00:00 > /share/apps/slurm-15.08.8//sbin/slurmctld > > on the compute node: > root 7874 1 0 17:03 ? 00:00:00 > /share/apps/slurm-15.08.8//sbin/slurmd > > Upon further testing, I only need restart the slurmctld daemon to get the > new user added such that he can run a job. So not as big a deal to me now > but it is different than in older versions of slurm. > > I'm adding a new user to an existing account and before I restart > slurmctld I see this in the slurmctld log when I try to "srun date" as that > user: > > [2016-03-30T17:04:50.107] error: User 9101 not found > [2016-03-30T17:04:50.107] _job_create: invalid account or partition for > user 9101, account '(null)', and partition 'debug' > [2016-03-30T17:04:50.142] _slurm_rpc_allocate_resources: Invalid account > or account/partition combination specified > [2016-03-30T17:05:11.381] Terminate signal (SIGINT or SIGTERM) received > > Oddly the account is "null" > > Here is the command to add the user, > sacctmgr add user johndoe defaultaccount=boris > partition=low,med,high,debug cluster=jane > > slurm-15.08.8 on Ubuntu 14.04.4 > > Like I said, I can live with it since its only 1 restart. > > Thanks, > Terri >
