Re: [slurm-users] Updated "pestat" tool for printing Slurm nodes status including GRES/GPU

2021-12-13 Thread Ole Holm Nielsen
The latest pestat version now adds a red color highlight if the GRES GPU is the (null) value. We use this to highlight jobs on GPU nodes which didn't request any GPU resources, thereby possibly wasting resources. Could you test if this is useful and give me a feedback? Thanks, Ole On 12/13/

Re: [slurm-users] Add new compute node without interruption

2021-12-13 Thread Ole Holm Nielsen
On 13-12-2021 18:55, Microbiome Studio wrote: We would like to know if it is planned to add this feature: Adding new compute node without interruption Indeed actually we have to stop compution, declare new nodes and resume the computation. such feature would be really helpfull with the growth of

Re: [slurm-users] Add new compute node without interruption

2021-12-13 Thread Brian Andrus
Indeed, this is accurate. We regularly add nodes on the fly (cloud based cluster). All that is need is to get them all set in the slurm.conf, restart slurmctld and do 'scontrol reconfigure' Brian Andrus On 12/13/2021 11:01 AM, Paul Brunk wrote: Hi: Normally, adding a new node requires alt

Re: [slurm-users] Add new compute node without interruption

2021-12-13 Thread Paul Brunk
Hi: Normally, adding a new node requires altering slurm.conf, and restarting slurmctld, and slurmd on each node. Restarting these daemons should not harm jobs and can be done while existing jobs are running. Wishing that I’d just listened this time, Paul Brunk, system administrator, Workstatio

[slurm-users] Add new compute node without interruption

2021-12-13 Thread Microbiome Studio
Dear, Firstly thanks slurm devloper for your amazing works. We would like to know if it is planned to add this feature:  Adding new compute node without interruption Indeed actually we have to stop compution, declare new nodes and resume the computation. such feature would be really helpfull wit

Re: [slurm-users] slurmdbd full backup so the primary can be purged

2021-12-13 Thread Paul Edmon
I haven't tested with super ancient versions of Slurm but I know we have uploaded past versions before so we could scrape the data for XDMod.  So as far as I'm aware there is no version limitation, but your mileage may vary with very old versions of Slurm.  To make sure I would probably ping Sc

Re: [slurm-users] Updated "pestat" tool for printing Slurm nodes status including GRES/GPU

2021-12-13 Thread Loris Bennett
Hi Ole, The new version looks good to me. Cheers, Loris Ole Holm Nielsen writes: > Hi Loris, > > I fixed errors in the hostnamelength calculation and formatting. > Could you grab the latest pestat and test it? > > Thanks, > Ole > > On 12/13/21 13:56, Loris Bennett wrote: >> Hi Ole, >> >> Ole

Re: [slurm-users] Updated "pestat" tool for printing Slurm nodes status including GRES/GPU

2021-12-13 Thread Ole Holm Nielsen
Hi Loris, I fixed errors in the hostnamelength calculation and formatting. Could you grab the latest pestat and test it? Thanks, Ole On 12/13/21 13:56, Loris Bennett wrote: Hi Ole, Ole Holm Nielsen writes: Hi Slurm users, I have updated the "pestat" tool for printing Slurm nodes status wi

Re: [slurm-users] Updated "pestat" tool for printing Slurm nodes status including GRES/GPU

2021-12-13 Thread Ole Holm Nielsen
Hi Loris, Thanks for the note. I need to figure out the correct variable width printf() options. I'm working on an update... Best regards, Ole On 12/13/21 13:56, Loris Bennett wrote: Hi Ole, Ole Holm Nielsen writes: Hi Slurm users, I have updated the "pestat" tool for printing Slurm n

[slurm-users] srun fails with "srun: error: Security violation, slurm message from uid" if delay in job starting

2021-12-13 Thread Mark Dixon
Hi all, Just wondering if anyone else had seen this. Running slurm 21.08.2, we're seeing srun work normally if it is able to run immediately. However, if there is a delay in job start, for example after a wait for another job to end, srun fails. e.g. [test@foo ~]$ srun -p test --pty bash

Re: [slurm-users] Updated "pestat" tool for printing Slurm nodes status including GRES/GPU

2021-12-13 Thread Loris Bennett
Hi Ole, Ole Holm Nielsen writes: > Hi Slurm users, > > I have updated the "pestat" tool for printing Slurm nodes status with 1 line > per > node including job info. The download page is > https://github.com/OleHolmNielsen/Slurm_tools/tree/master/pestat > (also listed in https://slurm.schedmd.c

Re: [slurm-users] slurmdbd full backup so the primary can be purged

2021-12-13 Thread Loris Bennett
Hi Paul, Am I right in assuming that there are going to be some limitations to loading archived data w.r.t. version of slurmdbd used to create the archive and that used to read it? Cheers, Loris Paul Edmon writes: > Files generated by the slurmdbd archive are read back into the live database

Re: [slurm-users] Failed to forward X11 with a remote scheduler

2021-12-13 Thread Stuart MacLachlan
HI Jeremy, I have just gone through setting this up with exactly the same issues. Not sure what your settings are or what you may have so far tried, but hopefully this will help… 1. Make usre Slurm > V17.11 (looks like when X forwarding was brought in instead of using SPANK plugin) 2. M