[slurm-users] Re: need to set From: address for slurm

2024-06-07 Thread Paul Edmon via slurm-users
There is no way to do it in slurm. You have to do it in the mail program you are using to send mail. In our case we use postfix and we set smtp_generic_maps to accomplish this. -Paul Edmon- On 6/7/2024 3:33 PM, Vanhorn, Mike via slurm-users wrote: All, When the slurm daemon is sending out

[slurm-users] Re: dynamical configuration || meta configuration mgmt

2024-05-29 Thread Paul Edmon via slurm-users
Many parameters in slurm can be changed via scontrol and sacctmgr commands without updating the conf itself. The thing is that scontrol commands are not durable across restarts. sacctmgr though update the slurmdb and thus will be sticky. That's at least what I would do is that if you are

[slurm-users] HPC Principal System Engineer at the Broad

2024-04-25 Thread Paul Edmon via slurm-users
A friend ask me to pass this along. Figured some folks on this list might be interested. https://broadinstitute.avature.net/en_US/careers/JobDetail/HPC-Principal-System-Engineer/17773 -Paul Edmon- -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to

[slurm-users] Re: Jobs of a user are stuck in Completing stage for a long time and cannot cancel them

2024-04-10 Thread Paul Edmon via slurm-users
Usually to clear jobs like this you have to reboot the node they are on. That will then force the scheduler to clear them. -Paul Edmon- On 4/10/2024 2:56 AM, archisman.pathak--- via slurm-users wrote: We are running a slurm cluster with version `slurm 22.05.8`. One of our users has reported

[slurm-users] Re: Avoiding fragmentation

2024-04-09 Thread Paul Edmon via slurm-users
I wrote a little blog post on this topic a few years back: https://www.rc.fas.harvard.edu/blog/cluster-fragmentation/ It's a vexing problem, but as noted by the other responders it is something that depends on your cluster policy and job performance needs. Well written MPI code should be

[slurm-users] Re: FairShare priority questions

2024-03-27 Thread Paul Edmon via slurm-users
For this use case you probably want to go with Classic Fairshare (https://slurm.schedmd.com/classic_fair_share.html) rather than FairTree. Classic Fairshare behaves in a way similar to what you describe. You can set up different bins for fairshare and then the user can pull from them. So that

[slurm-users] Slurm Utilities

2024-03-13 Thread Paul Edmon via slurm-users
Just wanted to share some slurm utilities that we've written at Harvard FASRC that maybe useful to the community. seff-account: https://github.com/fasrc/seff-account  Creates job statistics summaries for users and accounts similar to what seff and seff-array does. showq:

[slurm-users] Re: salloc+srun vs just srun

2024-02-28 Thread Paul Edmon via slurm-users
u MUST use srun -- Paul Raines (http://help.nmr.mgh.harvard.edu) On Wed, 28 Feb 2024 10:25am, Paul Edmon via slurm-users wrote:   External Email - Use Caution salloc is the currently recommended way for interactive sessions. srun is now intended for launching steps or MPI applications. S

[slurm-users] Re: salloc+srun vs just srun

2024-02-28 Thread Paul Edmon via slurm-users
salloc is the currently recommended way for interactive sessions. srun is now intended for launching steps or MPI applications. So properly you would salloc and then srun inside the salloc. As you've noticed with srun you tend lose control of your shell as it takes over so you have background

[slurm-users] Re: Question about IB and Ethernet networks

2024-02-26 Thread Paul Edmon via slurm-users
I concur with what folks have written so far, it really depends on your use case. For instance if you are looking at a cluster with GPU's and intend to do some serious computing there you are going to need RDMA of some sort. But it all depends on what you end up needing for your workflows.

[slurm-users] Re: Recover Batch Script Error

2024-02-16 Thread Paul Edmon via slurm-users
Are you using the job_script storage option? If so then you should be able to get at it by doing: sacct -B j JOBID https://slurm.schedmd.com/sacct.html#OPT_batch-script -Paul Edmon- On 2/16/2024 2:41 PM, Jason Simms via slurm-users wrote: Hello all, I've used the "scontrol write

[slurm-users] Re: Naive SLURM question: equivalent to LSF pre-exec

2024-02-14 Thread Paul Edmon via slurm-users
You probably want the Prolog option: https://slurm.schedmd.com/slurm.conf.html#OPT_Prolog along with: https://slurm.schedmd.com/slurm.conf.html#OPT_ForceRequeueOnFail -Paul Edmon- On 2/14/2024 8:38 AM, Cutts, Tim via slurm-users wrote: Hi, I apologise if I’ve failed to find this in the