Re: [slurm-users] slurm 22.05 "hash_k12" related upgrade issue

2022-10-24 Thread Paul Edmon
It only happens for versions on the 22.05 series prior to the latest release (22.05.5).  So the 21 version isn't impacted and you should be fine to upgrade from 21 to 22.05.5 and not see the hash_k12 issue.  If you upgrade to any prior minor version though you will hit this issue. -Paul

[slurm-users] slurm 22.05 "hash_k12" related upgrade issue

2022-10-24 Thread Marko Markoc
Hi All, Regarding https://lists.schedmd.com/pipermail/slurm-users/2022-September/009222.html . Question for all of you that might have done this upgrade recently, does this happen during the major version ( 21->22 in my case ) upgrade also ? All of the discussion I found online about it only

[slurm-users] Test Suite problems related to requesting tasks

2022-10-24 Thread Groner, Rob
I'm really pleased to find the test suite included with slurm, and after some initial difficulty, I now am able to run the unit tests and expect tests. The expect tests seem to generally be failing whenever the test involves tasks. Anything asking for more than 1 task per node is failing.

Re: [slurm-users] Ideal NFS exported StateSaveLocation size.

2022-10-24 Thread Brian Andrus
FWIW, I have used NFS/Gluster/Luster for a SaveStateLocation at various times on various clusters. I have never had an issue with any of them and run clusters in size up to 1000+ nodes. I have even used the same share to symlink all the nodes' slurm.conf with no issue. Of course, YMMV,

Re: [slurm-users] Ideal NFS exported StateSaveLocation size.

2022-10-24 Thread Paul Edmon
HA for slurmctld is not multidatacenter HA but rather a traditional HA setup where you have two server heads off of one storage brick (connected by SAS cables or other fast interconnect).  Multidatacenter HA has issues with keeping things in sync due to latency and IOPs (as noted below). So

[slurm-users] Slurm Power Saving & salloc

2022-10-24 Thread Gizo Nanava
Hello, it seems that in a cluster configured for power saving, salloc does not wait until the nodes assigned to the job recover from the power down state and go back to normal operation Although the job is in the state CONFIGURING and the node are still in IDLE+NOT_RESPONDING+POWERING_UP,

Re: [slurm-users] Ideal NFS exported StateSaveLocation size.

2022-10-24 Thread Ole Holm Nielsen
On 10/24/22 09:57, Diego Zuccato wrote: Il 24/10/2022 09:32, Ole Holm Nielsen ha scritto: > It is definitely a BAD idea to store Slurm StateSaveLocation on a slow > NFS directory!  SchedMD recommends to use local NVME or SSD disks > because there will be many IOPS to this file system! IIUC

Re: [slurm-users] Ideal NFS exported StateSaveLocation size.

2022-10-24 Thread Ward Poelmans
On 24/10/2022 09:32, Ole Holm Nielsen wrote: On 10/24/22 06:12, Richard Chang wrote: I have a two node Slurmctld setup and both will mount an NFS exported directory as the state save location. It is definitely a BAD idea to store Slurm StateSaveLocation on a slow NFS directory!  SchedMD

Re: [slurm-users] Ideal NFS exported StateSaveLocation size.

2022-10-24 Thread Diego Zuccato
Il 24/10/2022 09:32, Ole Holm Nielsen ha scritto: > It is definitely a BAD idea to store Slurm StateSaveLocation on a slow > NFS directory! SchedMD recommends to use local NVME or SSD disks > because there will be many IOPS to this file system! IIUC it does have to be shared between

Re: [slurm-users] Ideal NFS exported StateSaveLocation size.

2022-10-24 Thread Ole Holm Nielsen
On 10/24/22 06:12, Richard Chang wrote: Is there a thumb rule for the size of the directory that is NFS exported, and to be used as StateSaveLocation. I have a two node Slurmctld setup and both will mount an NFS exported directory as the state save location. It is definitely a BAD idea to