Fyi, sending sighup to slurmctld is sufficient for rotating the
slurmctld.log file. No need to actually restart it all the way. It is
good to know the cause behind the deleted jobs.
Doug
On Oct 11, 2016 7:36 AM, "Ryan Novosielski" wrote:
>
> Thanks for clearing that up.
Hi there,
I build SLURM 15.08.4 without the required libraries to build sview. That was
fine, but someone later asked us for sview, so we added the dependencies and
rebuilt. Now, upgrading to 15.08.12, we’re seeing that the slurm-15.08.12 RPM,
which will need to go on all compute nodes, will
Thanks for clearing that up. I was pretty sure there was no problem at all in
using logrotate, and I know that restarting slurmctld does not ordinarily lose
jobs.
--
|| \\UTGERS, |---*O*---
||_// the State | Ryan Novosielski -
Hello all,
sorry for this long delay since my first post.
Thanks for all the answers, it helped me to make some tests, and after not
so long, I realize I use a personnal script to launch the daemons, and I
was still using my "debug" start line, which contains the startclean
argument ...
So it's
Hello,
I see all the slurm logs file gets created with restricted file permission.
Is it any wat to change ti by default to allow groups to read the file?
-rw--- 1 sassrv sas 372 Sep 27 14:36 slurmdbd.27Sep2016.log
-rw--- 1 sassrv sas 281841 Sep 27 14:36 slurmctld.27Sep2016.log
Ok, I think what I want is to set the state of the partitions to down:
http://slurm.schedmd.com/scontrol.html#OPT_SPECIFICATIONS-FOR-CREATE,-UPDATE,-AND-DELETE-COMMANDS,-PARTITIONS
ie,
- no newly queued jobs will be started on that partition
- slurm will continue to accept jobs for that
Hola,
For reasons, our IT team needs some downtime on our authentication server
(FreeIPA/sssd).
We would like to minimize the disruption, but also not lose any work.
The current plan is for the nodes to be set to DRAIN on Friday afternoon
and on Monday morning we will suspend any running jobs,
Hi,
Following an issue we had with sreport where a user wasn't reporting
(thread below). We discovered that any change done to the accounting
database isn't notified to the slurm daemon, therefore it does not apply
changes until slurmctld is restarted.
The docs
Hi Everyone,
This is somewhat of a re-post of an old issue (
https://groups.google.com/forum/#!topic/slurm-devel/59xPbuhb_78).
It caught my attention recently so I re-investigated. The reason we
experience the problem is a curious interaction between older versions of
the hydra MPI launcher and
I suspect that you, like I, ended up with an incorrect "ControlHost" in
"sacctmgr list clusters". This is the address that will be notified that a
change has been made in the accounting database.
I still haven't gotten a suggestion on how to fix it without losing my
accounting data, though.
10 matches
Mail list logo