[slurm-users] New member , introduction

2023-09-29 Thread John Joseph
Dear All, Thanks for the mailing list. Just joined the list Like to introduce myself, My Name Joseph John work as system administrator. Have been working on LINUX, but novice to HPC and slurm. Trying to learn Thanks Joseph John

Re: [slurm-users] Verifying preemption WON'T happen

2023-09-29 Thread Groner, Rob
Well again, I don't want to tweak things just to get the test to happen quicker. I DO have to keep in mind the scheduler and backfill settings, though. For instance, I think the default scheduler and backfill interval is 60 and 30 seconds...or vice versa. So, before I check the Scheduler

Re: [slurm-users] Verifying preemption WON'T happen

2023-09-29 Thread Ryan Novosielski
You can get some information on that from sdiag, and there are tweaks you can make to backfill scheduling that affect how quickly it will get to a job. That doesn’t really answer your real question, but might help you when you are looking into this. Sent from my iPhone On Sep 29, 2023, at

Re: [slurm-users] Verifying preemption WON'T happen

2023-09-29 Thread Groner, Rob
I'm not looking for a one-time answer. We run these tests anytime we change anything related to slurmversion, configuration, etc.We certainly run the test after the system comes back up after an outage, and an hour would be a long time to wait for that. That's certainly the

Re: [slurm-users] Verifying preemption WON'T happen

2023-09-29 Thread Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)
On Sep 29, 2023, at 2:51 PM, Davide DelVento mailto:davide.quan...@gmail.com>> wrote: I don't really have an answer for you other than a "hallway comment", that it sounds like a good thing which I would test with a simulator, if I had one. I've been intrigued by (but really not looked much

Re: [slurm-users] Verifying preemption WON'T happen

2023-09-29 Thread Davide DelVento
I don't really have an answer for you other than a "hallway comment", that it sounds like a good thing which I would test with a simulator, if I had one. I've been intrigued by (but really not looked much into) https://slurm.schedmd.com/SLUG23/LANL-Batsim-SLUG23.pdf On Fri, Sep 29, 2023 at 10:05 

[slurm-users] FAQ Errata: Can the make command utilize the resources allocated to a Slurm job? answer is out of data

2023-09-29 Thread Cook, Malcolm
FWIW: The answer to the question "Can the make command utilize the resources allocated to a Slurm job?" [1] is out of date. The patch mentioned is no longer in the distribution, and personal inspection finds that it no longer applies to newer versions of Gnu Make. [1]

Re: [slurm-users] Steps to upgrade slurm for a patchlevel change?

2023-09-29 Thread Ole Holm Nielsen
On 29-09-2023 17:33, Ryan Novosielski wrote: I’ll just say, we haven’t done an online/jobs running upgrade recently (in part because we know our database upgrade will take a long time, and we have some processes that rely on -M), but we have done it and it does work fine. So the paranoia isn’t

[slurm-users] Verifying preemption WON'T happen

2023-09-29 Thread Groner, Rob
On our system, for some partitions, we guarantee that a job can run at least an hour before being preempted by a higher priority job. We use the QOS preempt exempt time for this, and it appears to be working. But of course, I want to TEST that it works. So on a test system, I start a lower

Re: [slurm-users] Steps to upgrade slurm for a patchlevel change?

2023-09-29 Thread Groner, Rob
My team lead brought that up also, that we could go ahead and change the symlink that EVERYTHING uses, and nothing would happen...until the service is restarted. That's good that it's not a timing-related change. Of course, we do run the risk that a node will variously reboot on its own, and

Re: [slurm-users] Steps to upgrade slurm for a patchlevel change?

2023-09-29 Thread Ryan Novosielski
I’ll just say, we haven’t done an online/jobs running upgrade recently (in part because we know our database upgrade will take a long time, and we have some processes that rely on -M), but we have done it and it does work fine. So the paranoia isn’t necessary unless you know that, like us, the

[slurm-users] docker containers and slurm

2023-09-29 Thread Jake Jellinek
Hi list I have built a small cluster and have attached a few clients to it. My clients can submit jobs so am confident that the service is setup sufficiently. What I would like to do is to deploy the slurm client into a docker container. From within the docker container, I have setup munge and

Re: [slurm-users] Steps to upgrade slurm for a patchlevel change?

2023-09-29 Thread Paul Edmon
This is one of the reasons we stick with using RPM's rather than the symlink process. It's just cleaner and avoids the issue of having the install on shared storage that may get overwhelmed with traffic or suffer outages. Also the package manager automatically removes the previous versions and

Re: [slurm-users] Steps to upgrade slurm for a patchlevel change?

2023-09-29 Thread Groner, Rob
I did already see the upgrade section of Jason's talk, but it wasn't much about the mechanics of the actual upgrade process, more of a big picture it seemed. It dealt a lot with different parts of slurm at different versions, which is something we don't have. One little wrinkle here is that

Re: [slurm-users] enabling job script archival

2023-09-29 Thread Davide DelVento
Fantastic, this is really helpful, thanks! On Thu, Sep 28, 2023 at 12:05 PM Paul Edmon wrote: > Yes it was later than that. If you are 23.02 you are good. We've been > running with storing job_scripts on for years at this point and that part > of the database only uses up 8.4G. Our entire

Re: [slurm-users] Steps to upgrade slurm for a patchlevel change?

2023-09-29 Thread Ryan Novosielski
I started off writing there’s really no particular process for these/just do your changes and start the new software (be mindful of any PATH that might contain data that’s under your software tree, if you have that setup), and that you might need to watch the timeouts, but I figured I’d have a

Re: [slurm-users] Steps to upgrade slurm for a patchlevel change?

2023-09-29 Thread Ole Holm Nielsen
On 9/28/23 17:58, Groner, Rob wrote: There's 14 steps to upgrading slurm listed on their website, including shutting down and backing up the database.  So far we've only updated slurm during a downtime, and it's been a major version change, so we've taken all the steps indicated. We now want