[slurm-dev] Re: Slurmctld auto restart and kill running job, why ?

2016-10-11 Thread Douglas Jacobsen
Fyi, sending sighup to slurmctld is sufficient for rotating the slurmctld.log file. No need to actually restart it all the way. It is good to know the cause behind the deleted jobs. Doug On Oct 11, 2016 7:36 AM, "Ryan Novosielski" wrote: > > Thanks for clearing that up.

[slurm-dev] SLURM 15.08.12; disable sview?

2016-10-11 Thread Ryan Novosielski
Hi there, I build SLURM 15.08.4 without the required libraries to build sview. That was fine, but someone later asked us for sview, so we added the dependencies and rebuilt. Now, upgrading to 15.08.12, we’re seeing that the slurm-15.08.12 RPM, which will need to go on all compute nodes, will

[slurm-dev] Re: Slurmctld auto restart and kill running job, why ?

2016-10-11 Thread Ryan Novosielski
Thanks for clearing that up. I was pretty sure there was no problem at all in using logrotate, and I know that restarting slurmctld does not ordinarily lose jobs. -- || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski -

[slurm-dev] Re: Slurmctld auto restart and kill running job, why ?

2016-10-11 Thread Philippe
Hello all, sorry for this long delay since my first post. Thanks for all the answers, it helped me to make some tests, and after not so long, I realize I use a personnal script to launch the daemons, and I was still using my "debug" start line, which contains the startclean argument ... So it's

[slurm-dev] slurmdbd.log gets created with 600 file perm - Any way to change this to 755 by default?

2016-10-11 Thread Balaji Deivam
Hello, I see all the slurm logs file gets created with restricted file permission. Is it any wat to change ti by default to allow groups to read the file? -rw--- 1 sassrv sas 372 Sep 27 14:36 slurmdbd.27Sep2016.log -rw--- 1 sassrv sas 281841 Sep 27 14:36 slurmctld.27Sep2016.log

[slurm-dev] Re: Draining, Maint or ?

2016-10-11 Thread Lachlan Musicman
Ok, I think what I want is to set the state of the partitions to down: http://slurm.schedmd.com/scontrol.html#OPT_SPECIFICATIONS-FOR-CREATE,-UPDATE,-AND-DELETE-COMMANDS,-PARTITIONS ie, - no newly queued jobs will be started on that partition - slurm will continue to accept jobs for that

[slurm-dev] Draining, Maint or ?

2016-10-11 Thread Lachlan Musicman
Hola, For reasons, our IT team needs some downtime on our authentication server (FreeIPA/sssd). We would like to minimize the disruption, but also not lose any work. The current plan is for the nodes to be set to DRAIN on Friday afternoon and on Monday morning we will suspend any running jobs,

[slurm-dev] Accounting needs slurm daemon restart to apply changes

2016-10-11 Thread Eneko Anasagasti
Hi, Following an issue we had with sreport where a user wasn't reporting (thread below). We discovered that any change done to the accounting database isn't notified to the slurm daemon, therefore it does not apply changes until slurmctld is restarted. The docs

[slurm-dev] srun eio_handle_mainloop/eio_signal_shutdown race and error

2016-10-11 Thread Aaron Knister
Hi Everyone, This is somewhat of a re-post of an old issue ( https://groups.google.com/forum/#!topic/slurm-devel/59xPbuhb_78). It caught my attention recently so I re-investigated. The reason we experience the problem is a curious interaction between older versions of the hydra MPI launcher and

[slurm-dev] Re: Accounting needs slurm daemon restart to apply changes

2016-10-11 Thread Ryan Novosielski
I suspect that you, like I, ended up with an incorrect "ControlHost" in "sacctmgr list clusters". This is the address that will be notified that a change has been made in the accounting database. I still haven't gotten a suggestion on how to fix it without losing my accounting data, though.