[slurm-dev] Re: Fixing corrupted slurm accounting?

2017-10-28 Thread Douglas Jacobsen
Jacobsen, Ph.D. NERSC Computer Systems Engineer National Energy Research Scientific Computing Center <http://www.nersc.gov> dmjacob...@lbl.gov - __o -- _ '\<,_ --(_)/ (_)__ On Sat, Oct 28, 2017 at 9:17 AM, Douglas Jacobsen <dmjacob...@

[slurm-dev] Re: Fixing corrupted slurm accounting?

2017-10-28 Thread Douglas Jacobsen
Once you've got the end times fixed, youll need to manually update the timestamps in the _last_ran table to some time point before the start of the earliest job fixed. Then on the next hour mark, it'll start rerolling up the past data to reflect the new reality you've set in the database.

[slurm-dev] Re: Running jobs are stopped and reqeued when adding new nodes

2017-10-22 Thread Douglas Jacobsen
You cannot change the nodelist without draining the system of running jobs (terminating all slurmstepd) and restarting all slurmd and slurmctld. This is because slurm uses a bit mask to represent the nodelist, and slurm uses a hierarchical overlay communication network. If all daemons don't have

[slurm-dev] Re: slurmctld causes slurmdbd to seg fault

2017-10-17 Thread Douglas Jacobsen
You probably have a core file in the directory where slurmdbd logs to, a back trace from gdb would be most telling On Oct 17, 2017 08:17, "Loris Bennett" wrote: > > Hi, > > We have been having some with NFS mounts via Infiniband getting dropped > by nodes. We ended

[slurm-dev] Re: Finding job command after fails

2017-10-15 Thread Douglas Jacobsen
We use a job completion plugin to store that data. Ours is custom, but it is loosely based on the elastic completion plugin, which may be a good starting point. On Oct 15, 2017 12:48, "Ryan Richholt" wrote: > Is there any way to get the job command with sacct? > > For

[slurm-dev] Re: slurmstepd error

2017-09-15 Thread Douglas Jacobsen
What is the working directory for slurmd? I suspect slurmstepd would fork there, perhaps there is some issue with it? Doug Jacobsen, Ph.D. NERSC Computer Systems Engineer National Energy Research Scientific Computing Center dmjacob...@lbl.gov - __o

[slurm-dev] Re: On the need for slurm uid/gid consistency

2017-09-13 Thread Douglas Jacobsen
I would suggest it is a more general requirement, not simply enforced by use of munge, which does imply a unified uid trust level across all nodes using the same preshared key, but also when jobs are started, they are started with a particular uid and other credentials (transmitted in the slurm

[slurm-dev] Re: How are paired dependencies handled?

2017-08-11 Thread Douglas Jacobsen
I think you want the *kill_invalid_depend* schedulerParameter to have slurmctld automatically clean up jobs that can never run owing to unsatisfiable dependencies. On Aug 11, 2017 3:58 PM, "Alex Reynolds" wrote: > Say I submit a job via `sbatch`. Slurm gives it a job ID of

[slurm-dev] Re: Proposed new dependency "--during"?

2017-07-20 Thread Douglas Jacobsen
This sounds a bit like the jobpacks stuff that is in development right now. It's more focused on heterogeneous computing but really, at the core, it's done as multiple jobs that run simultaneously and then merged (I think). But more in general, you could imagine a "service" queue one one set of

[slurm-dev] Re: ssh tunneling

2017-06-20 Thread Douglas Jacobsen
salloc srun --pty -n1 -N1 --mem-per-cpu=0 --cpu_bind=none --mpi=none $SHELL will probably do what you want, i.e., get an allocation and start a shell on the remote node. Doug Jacobsen, Ph.D. NERSC Computer Systems Engineer National Energy Research Scientific Computing Center

[slurm-dev] Re: Scheduling weirdness

2017-06-16 Thread Douglas Jacobsen
I typically recommend that bf_window be roughly 2x the max wall time, this allows for planning beyond the edge of the window. You may need to increase bf_resolution (it should be fine for almost all cases to go up to 300s), and potentially increase bf_interval to ensure there is enough time for

[slurm-dev] Re: Accounting: preventing scheduling after TRES limit reached (permanently)

2017-06-05 Thread Douglas Jacobsen
e that component in the priority, but still enforce the limits with the GrpMins parameters right? Thanks, Jacob Chappell On Mon, Jun 5, 2017 at 9:05 AM, Douglas Jacobsen <dmjacob...@lbl.gov> wrote: > Sorry, I meant GrpTRESMins, but that usage is decayed, as Chris mentioned, > based

[slurm-dev] Re: Accounting: preventing scheduling after TRES limit reached (permanently)

2017-06-05 Thread Douglas Jacobsen
Sorry, I meant GrpTRESMins, but that usage is decayed, as Chris mentioned, based on the decay rate half life. In your scenario however, it seems like not decaying usage would make sense. Are you wanting to consider recent usage when making priority decisions? On Jun 5, 2017 5:53 AM, "Do

[slurm-dev] Re: Accounting: preventing scheduling after TRES limit reached (permanently)

2017-06-05 Thread Douglas Jacobsen
I think you could still set GrpTRESRunMins on an account or association to set hard quotas. On Jun 5, 2017 5:21 AM, "Jacob Chappell" wrote: > Hi Chris, > > Thank you very much for the details and clarification. It's unfortunate > that you can't have both fairshare and

[slurm-dev] Re: How to cleanup mysql db old records?

2017-05-25 Thread Douglas Jacobsen
Regarding the "more allocated time than is possible" messages, I'd suggest checking for runaway jobs: sacctmgr show runawayjobs You might want to look at the records a bit before agreeing to let it fix them automatically. If that doesn't find anything, there might be some nodes incorrectly down

[slurm-dev] Re: Compute nodes drained or draining

2017-05-17 Thread Douglas Jacobsen
Batch job completion failure typically indicates an issue on the slurmd or slurmstepd side of things that slurmctld is unsure how to deal with. Try checking your slurmd logs (debug level) on the impacted nodes. Given the asterisk in the sinfo output, I'm also guessing that slurmd exited. There

[slurm-dev] Re: Adjusting MaxJobCount and SlurmctldPort settings

2017-05-16 Thread Douglas Jacobsen
Hello, Changing slurmctld port should probably wait until all jobs have stopped running. Running jobs won't fail in this case, but there is a good chance they will fail to complete properly, and the compute node operating them might get stuck in the completing state (since the slurmstepd

[slurm-dev] Re: User accounting

2017-05-04 Thread Douglas Jacobsen
-- __o -- _ '\<,_ --(_)/ (_)__ On Thu, May 4, 2017 at 7:02 AM, Douglas Jacobsen <dmjacob...@lbl.gov> wrote: > You can also use sreport to get summaries (though it is limited) > > sreport user top users= --gres=cpu,mem > > Can include other limits like cluster, start, end, can group by ac

[slurm-dev] Re: User accounting

2017-05-04 Thread Douglas Jacobsen
You can also use sreport to get summaries (though it is limited) sreport user top users= --gres=cpu,mem Can include other limits like cluster, start, end, can group by account and so on. Limitation is that the TopUsers report only ever shows the top10 users. Would be nice to get top N users.

[slurm-dev] Re: inject arbitrary env variables in Slurm job

2017-01-26 Thread Douglas Jacobsen
Another way is to use a job_submit plugin, a lua-based one in particular, then you have a great deal of control and it is performed at job submit time. You can modify job_request.env array to manipulate environment variables. Doug Jacobsen, Ph.D. NERSC Computer Systems Engineer National

[slurm-dev] Re: srun job launch time issue

2017-01-11 Thread Douglas Jacobsen
Are these sruns already in an allocation or not? If not, you might consider setting PrologFlags=alloc in slurm.conf, which should perform much of the remote job setup when the head node is configured (presuming that might be your issue, or you have a configuration that might make that an

[slurm-dev] Re: SLURM reports much higher memory usage than really used

2016-12-15 Thread Douglas Jacobsen
There are other good reasons to use jobacct_gather/cgroup, in particular if memory enforcement is used, jobacct_gather/linux will cause a job to be terminated if the summed memory exceeds the limit, which is OK so long as large memory processes aren't forking and artificially increasing the

[slurm-dev] Re: Two partitions with same compute nodes

2016-11-29 Thread Douglas Jacobsen
One possible solution might be to implement a job_submit plugin (ideally using the lua interface). You could check the gres request field, and if it includes a GPU request, then either force the user to the cuda partition, or deny the job if it isn't submitted to the cuda partition. e.g.,

[slurm-dev] Re: problem configuring mvapich + slurm: "error: mpi/pmi2: failed to send temp kvs to compute nodes"

2016-11-18 Thread Douglas Jacobsen
Hello, Is " /home/localsoft/slurm/spool" local to the node? Or is it on the network? I think each node needs to have separate data (like job_cred) stored there, and if each slurmd is competing for that file naming space I could imagine that srun could have problems. I typically use

[slurm-dev] Re: poor utilization, jobs not being scheduled

2016-10-29 Thread Douglas Jacobsen
Hello, What you describe sounds like the backfill scheduler is not getting all the way through the queue. A simple adjustment (with some downsides) is to set bf_interval in your SchedulerParameters field of slurm.conf to something bigger than the default of 30s (I use 120). Another important

[slurm-dev] Re: Reserved column on UserUtilizationByAccount sreports

2016-10-18 Thread Douglas Jacobsen
Reserved time in sreport is time nodes are held idle (by the backfill scheduler) to start the job. If you aren't using backfill, or if all job submissions request about the same quantity of hardware resources then it may always be zero. If there were some users submitting large jobs and some

[slurm-dev] Re: Slurmctld auto restart and kill running job, why ?

2016-10-11 Thread Douglas Jacobsen
Fyi, sending sighup to slurmctld is sufficient for rotating the slurmctld.log file. No need to actually restart it all the way. It is good to know the cause behind the deleted jobs. Doug On Oct 11, 2016 7:36 AM, "Ryan Novosielski" wrote: > > Thanks for clearing that up.

[slurm-dev] Re: QOS, Limits, CPUs and threads - something is wrong?

2016-10-03 Thread Douglas Jacobsen
Hi Lachlan, You mentioned your slurm.conf has: AccountingStorageEnforce=qos The "qos" restriction only enforces that a user is authorized to use a particular qos (in the qos string of the association in the slurm database). To enforce limits, you need to also use limits. If you want to prevent

[slurm-dev] Re: Backfill scheduler should look at all jobs

2016-08-23 Thread Douglas Jacobsen
Hello, I'd recommend taking a look at bf_min_prio_resv (16.05 feature). The basic idea is that above the priority threshold it'll do the normal backfill algorithm -- look at each job, in order, check to see if it can run, if not, plan for it. Below the threshold, it'll still go in order, but

[slurm-dev] Re: SLURM job's email notification does not work

2016-08-18 Thread Douglas Jacobsen
Email is only sent by slurmctld, you'll need to change slurm.conf there and at least do an `scontrol reconfigure`, then perhaps it'll start working. -Doug Doug Jacobsen, Ph.D. NERSC Computer Systems Engineer National Energy Research Scientific Computing Center

[slurm-dev] Re: SPANK prolog not run via sbatch (bug?)

2016-07-08 Thread Douglas Jacobsen
Hello, Do you have "PrologFlags=alloc" in slurm.conf? If not, you'll need it, otherwise the privileged prologs won't run until the first step is executed on a node. -Doug Doug Jacobsen, Ph.D. NERSC Computer Systems Engineer National Energy Research Scientific Computing Center

[slurm-dev] RE: An issue with Grid Engine to Slurm migration

2016-05-05 Thread Douglas Jacobsen
Hello, As for allowing users to specify defaults for their sbatch/salloc executions, I think the best analog in SLURM to the GridEngine .sge_request is for users to set environment variables in their dotfiles or other means prior to running sbatch. e.g., SBATCH_ACCOUNT would set an implicit "-A"

[slurm-dev] Re: Need to restart slurmctld when adding user to accounting

2016-03-30 Thread Douglas Jacobsen
-- __o -- _ '\<,_ --(_)/ (_)__ On Wed, Mar 30, 2016 at 5:38 PM, Douglas Jacobsen <dmjacob...@lbl.gov> wrote: > Are both slurmdbd and slurmctld running as the same UID? (if not they > need to be, I believe you can see the errors on slurmdbd debug2 or debug3) > > >

[slurm-dev] Re: Need to restart slurmctld when adding user to accounting

2016-03-30 Thread Douglas Jacobsen
Are both slurmdbd and slurmctld running as the same UID? (if not they need to be, I believe you can see the errors on slurmdbd debug2 or debug3) Doug Jacobsen, Ph.D. NERSC Computer Systems Engineer National Energy Research Scientific Computing Center

[slurm-dev] Re: MaxTRESMins limit on a job kills a running job -- is it meant to?

2016-01-07 Thread Douglas Jacobsen
I think you probably want to add "safe" to AccountingStorageEnforce in slurm.conf; that should prevent it from starting jobs that would exceed association limits. Doug Jacobsen, Ph.D. NERSC Computer Systems Engineer National Energy Research Scientific Computing Center

[slurm-dev] Re: need to restart slurm daemons for accounting changes

2015-12-28 Thread Douglas Jacobsen
I'm betting that slurmctld is running as a different uid than slurmdbd. Once both are running as the same uid, slurmctld will start taking updates from slurmdbd (via sacctmgr). -Doug On Mon, Dec 28, 2015 at 2:51 PM, Terri Knight wrote: > Since upgrading to slurm 15.08.1

[slurm-dev] Re: Fwd: SLURM : how to have a round-robin across nodes based on load average?

2015-11-18 Thread Douglas Jacobsen
Check out the LLN partition configuration option. Least loaded node On Nov 18, 2015 6:40 PM, "cips bmkg" wrote: > Hi, > > If you generate a lot of mono-core sequential tasks, the regular SLURM > allocation would pile them up into the first node, following with second , >

[slurm-dev] Re: NERSC shifter

2015-11-12 Thread Douglas Jacobsen
Hello, An early release of the software should be available starting next week! We're trying to get the final pieces in place (sans documentation) before SC. I'll send a notification to this list once it is available. Sorry for the delays! -Doug Doug Jacobsen, Ph.D. NERSC Computer Systems

[slurm-dev] Re: Partition QoS

2015-11-10 Thread Douglas Jacobsen
Hi Paul, I did this by creating the qos, e.g. sacctmgr create qos part_whatever Then in slurm.conf setting qos=part_whatever in the "whatever" partition definition. scontrol reconfigure finally, set the limits on the qos: sacctmgr modify qos set MaxJobsPerUser=5 where name=part_whatever ...

[slurm-dev] Re: scripted use of sacctmgr

2015-10-14 Thread Douglas Jacobsen
I just went through this exercise to integrate the SLURM database with our site database. I found that preparing a file (or string) like: add user abc account=aaa add user def account=bbb add user def account=aaa ... ... exit and then piping that to stdin of "sacctmgr -i" gave an easy way to

[slurm-dev] RE: Distribute M jobs on N nodes without duplication

2015-10-02 Thread Douglas Jacobsen
Hi, I'm not sure I understand the problem but you can specify -N (--nodes) and tasks and so on for each srun. That way you can control how many nodes and tasks are distributed per srun: srun -N 1 --gres=gpu:1 ... srun -N 1 --gres=gpu:1 ... from your original example should work.. -Doug On

[slurm-dev] epilogue / health check races

2015-04-02 Thread Douglas Jacobsen
Hi all, I saw post earlier today (or yesterday) about jobs in a dependency chain starting while the prior job epilogue is still running. I have a related, but more general case of this. I've been using a test configuration of slurm on a Cray XC30 in hybrid mode. I've seen that the

[slurm-dev] spank plugin development: is it possible to save derived state in a job

2015-03-29 Thread Douglas Jacobsen
Hello, I'm working on a SPANK plugin which needs to accept input from the user (i.e., from a command line option), perform some non-trivial work during the allocator context that may take some time, and then upon success allow the job to continue. Later steps, for example in the job plugin, task