Re: [slurm-users] Running multi jobs on one CPU in parallel

2021-09-14 Thread Williams, Gareth (IM, Black Mountain)
The simplest approach might be to run multiple processes within each batch job. Gareth Get Outlook for Android From: slurm-users on behalf of Emre Brookes Sent: Wednesday, September 15, 2021 6:42:24 AM To: Karl Lovink ; Slurm User

Re: [slurm-users] Running multi jobs on one CPU in parallel

2021-09-14 Thread Emre Brookes
Hi Karl, I haven't tested the MAX_TASKS_PER_NODE limits. According to slurm.conf *MaxTasksPerNode* Maximum number of tasks Slurm will allow a job step to spawn on a single node. The default *MaxTasksPerNode* is 512. May not exceed 65533 So I'd try setting that and "scontrol

Re: [slurm-users] Running multi jobs on one CPU in parallel

2021-09-14 Thread Karl Lovink
Hi Emre, MAX_TASKS_PER_NODE is set to 512. Does this means I cannot run more than 512 jobs in parallel on one node? Or can I change MAX_TASKS_PER_NODE to a higher value? And recompile slurm. Regards, Karl On 14/09/2021 21:47, Emre Brookes wrote: > *-O*, *--overcommit* >    Overcommit

Re: [slurm-users] Running multi jobs on one CPU in parallel

2021-09-14 Thread Emre Brookes
*-O*, *--overcommit* Overcommit resources. When applied to job allocation, only one CPU is allocated to the job per node and options used to specify the number of tasks per node, socket, core, etc. are ignored. When applied to job step allocations (the *srun* command when executed

[slurm-users] Running multi jobs on one CPU in parallel

2021-09-14 Thread Karl Lovink
Hello, I am in the process of setting up our SLURM environment. We want to use SLURM during our DDoS exercises for dispatching DDoS attack scripts. We need a lot of parallel running jobs on a total of 3 nodes.I can't get it to run more than 128 jobs simultaneously. There are 128 cpu's in the

[slurm-users] Using Nice to Break Ties

2021-09-14 Thread Paul Edmon
We use the classic fairshare algorithm here with users having their shares set to to parent and pulling from the group pool rather than having each user have their own fairshare (you can see our doc here: https://docs.rc.fas.harvard.edu/kb/fairshare/). This has worked very well for us for many

Re: [slurm-users] Slurm Job Error Output is Missing

2021-09-14 Thread Maria Semple
Hello again! I had a realisation last night that I was probably truncating the previous stderr output by not supplying the -a argument to tee. After some testing this morning, I can happily say that the following script works as expected: test.sh: #!/bin/bash echo "out" echo "err" >&2 echo "err

Re: [slurm-users] [External] How can I do to prevent a specific job from being prempted?

2021-09-14 Thread Russell Jones
The other option is creating a "special" partition that only this user(s) can submit to, where jobs running in that partition have a higher priority than all the others (if you are using partition priority like we are). On Tue, Sep 14, 2021 at 3:26 AM Loris Bennett wrote: > Dear Peter, > > 顏文

Re: [slurm-users] FreeMem is not equal to (RealMem - AllocMem)

2021-09-14 Thread Bjørn-Helge Mevik
Pavel Vashchenkov writes: > There is a line "RealMemory=257433 AllocMem=155648 FreeMem=37773 > Sockets=2 Boards=1" > > > My question is: Why there is so few FreeMem (37 GB instead of expected > 100 GB (RealMem - AllocMem))? If I recall correctly, RealMem is what you have configured in

Re: [slurm-users] FreeMem is not equal to (RealMem - AllocMem)

2021-09-14 Thread Diego Zuccato
Il 14/09/2021 06:52, Pavel Vashchenkov ha scritto: My question is: Why there is so few FreeMem (37 GB instead of expected 100 GB (RealMem - AllocMem))? PS On other nodes the situation is similar: RealMemory=257433 AllocMem=180224 FreeMem=7913 On free node (it is not allocated for computation

Re: [slurm-users] [External] How can I do to prevent a specific job from being prempted?

2021-09-14 Thread Loris Bennett
Dear Peter, 顏文 writes: > Dear Mr. Zillner > > I would like the specific running job not being rescheduled , but also can > not be terminated or cancelled in any way. If the job is cancelled, I need to > start it over again. Normally this kind of jobs require weeks to > finish. So the time

Re: [slurm-users] [External] How can I do to prevent a specific job from being prempted?

2021-09-14 Thread 顏文
Dear Mr. Zillner I would like the specific running job not being rescheduled , but also can not be terminated or cancelled in any way. If the job is cancelled, I need to start it over again. Normally this kind of jobs require weeks to finish. So the time costs it take to restart is quite

Re: [slurm-users] [External] How can I do to prevent a specific job from being prempted?

2021-09-14 Thread Florian Zillner
See the no-requeue option for SBATCH: --no-requeue Specifies that the batch job should never be requeued under any circumstances. Setting this option will prevent system administrators from being able to restart the job (for example, after a scheduled downtime), recover from a node failure, or