Re: [slurm-users] upgrading slurm to 20.11

2022-05-17 Thread Christopher Samuel
On 5/17/22 12:00 pm, Paul Edmon wrote: Database upgrades can also take a while if your database is large. Definitely recommend backing up prior to upgrade as well as running slurmdbd -Dv and not the systemd daemon as if the upgrade takes a long time it will kill it preemptively due to

Re: [slurm-users] upgrading slurm to 20.11

2022-05-17 Thread Paul Edmon
Database upgrades can also take a while if your database is large.  Definitely recommend backing up prior to upgrade as well as running slurmdbd -Dv and not the systemd daemon as if the upgrade takes a long time it will kill it preemptively due to unresponsiveness which will create all

Re: [slurm-users] upgrading slurm to 20.11

2022-05-17 Thread Ole Holm Nielsen
Hi, You can upgrade from 19.05 to 20.11 in one step (2 major releases), skipping 20.02. When that is completed, it is recommended to upgrade again from 20.11 to 21.08.8 in order to get the current major version. The 22.05 will be out very soon, but you may want to wait a couple of minor

Re: [slurm-users] upgrading slurm to 20.11

2022-05-17 Thread Brian Andrus
So the need to go step-by-step is due to changes in the database schema. The upgrade process is not able to upgrade if there is too big of a difference. That is a little gotcha: so when upgrading, you need to start slurmdbd and let it run for a bit as it does the database update (you can

Re: [slurm-users] upgrading slurm to 20.11

2022-05-17 Thread Paul Edmon
I think it should be, but you should be able to run a test and find out. -Paul Edmon- On 5/17/22 12:13 PM, byron wrote: Sorry, I should have been clearer.   I understand that with regards to slurmd / slurmctld you can skip a major release without impacting running jobs etc.  My questions was

[slurm-users] Jobs stuck with BeginTime and prolog exit status 99:0

2022-05-17 Thread Chandler
Could you help me figure out why our jobs are stuck PD because of BeginTime? e.g: JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 24458 defq cromwell smrtanal PD 0:00 1 (BeginTime) # scontrol show job 24458 JobId=24458

Re: [slurm-users] upgrading slurm to 20.11

2022-05-17 Thread byron
Sorry, I should have been clearer. I understand that with regards to slurmd / slurmctld you can skip a major release without impacting running jobs etc. My questions was about upgrading slurmdbd and whether it was necessary to upgrade through the intermediate major releases (which I know

Re: [slurm-users] upgrading slurm to 20.11

2022-05-17 Thread Paul Edmon
The slurm docs say you can do two major releases at a time (https://slurm.schedmd.com/quickstart_admin.html): "Almost every new major release of Slurm (e.g. 20.02.x to 20.11.x) involves changes to the state files with new data structures, new options, etc. Slurm permits upgrades to a new

Re: [slurm-users] upgrading slurm to 20.11

2022-05-17 Thread byron
Thanks Brian for the speedy responce. Am I not correct in thinking that if I just go from 19.05 to 20.11 then there is the advantage that I can upgrade slurmd and slurmctld in one go and it won't affect the running jobs since upgrading to a new major release from the past two major releases

Re: [slurm-users] Slurm notifications, a more comprehensive solution - goslmailer

2022-05-17 Thread Hermann Schwärzler
Hi Petar, thanks for letting us know! We will definitely look into this and will get back to you on GitHub when technical questions/problems arise. Just one quick question: we are neither using Telegram nor MS-Teams here, but Matrix. In case we would like to deliver messages through that:

Re: [slurm-users] container on slurm cluster

2022-05-17 Thread Timo Rothenpieler
On 17.05.2022 15:58, Brian Andrus wrote: You are starting to understand a major issue with most containers. I suggest you check out Singularity, which was built from the ground up to address most issues. And it can run other container types (eg: docker). Brian Andrus Side-Note to this,

Re: [slurm-users] upgrading slurm to 20.11

2022-05-17 Thread Brian Andrus
You need to step upgrade through major versions (not minor). So 19.05=>20.x I would highly recommend going to 21.08 while you are at it. I just did the same migration (although they started at 18.x) with no issues. Running jobs were not impacted and users didn't even notice. Brian Andrus

[slurm-users] upgrading slurm to 20.11

2022-05-17 Thread byron
Hi I'm looking at upgrading our install of slurm from 19.05 to 20.11 in responce to the recenty announced security vulnerabilities. I've been through the documentation / forums and have managed to find the answers to most of my questions but am still unclear about the following - In upgrading

Re: [slurm-users] container on slurm cluster

2022-05-17 Thread Hermann Schwärzler
Hi GHui, fyi: I am not a podman-expert so my questions might be stupid. :-) From what you told us so far you are running the podman-command as non-root but you are root inside the container, right? What is the output of "podman info | grep root" in your case? How are you submitting a job

Re: [slurm-users] container on slurm cluster

2022-05-17 Thread Brian Andrus
You are starting to understand a major issue with most containers. I suggest you check out Singularity, which was built from the ground up to address most issues. And it can run other container types (eg: docker). Brian Andrus On 5/16/2022 10:49 PM, GHui wrote: I use podman 4.0.2. And slurm

Re: [slurm-users] Performance tracking of array tasks

2022-05-17 Thread William Dear
> What is the use-case for having users need to self-limit? Our users self limit jobs with extremely high disk IO requirements. Some batch jobs read/write over 15TB a day and I haven't identified an effective method of capping IOPS per user. We still have issues with the occasional user