[slurm-users] Re: GPU GRES verification and some really broad questions.

2024-05-10 Thread Loris Bennett via slurm-users
ing and writing, as the OSs have been installed > on NVME drives. Depending on the IO patterns created by a piece of software using the distributed file system might be fine or a local disk might be needed. Note that you might experience problems with /tmp filling up, so it may be better to have a s

[slurm-users] Re: [EXTERN] Re: scheduling according time requirements

2024-04-30 Thread Loris Bennett via slurm-users
Hi Dietmar, Dietmar Rieder via slurm-users writes: > Hi Loris, > > On 4/30/24 3:43 PM, Loris Bennett via slurm-users wrote: >> Hi Dietmar, >> Dietmar Rieder via slurm-users >> writes: >> >>> Hi Loris, >>> >>> On 4/30/24 2

[slurm-users] Re: [EXTERN] Re: scheduling according time requirements

2024-04-30 Thread Loris Bennett via slurm-users
Hi Dietmar, Dietmar Rieder via slurm-users writes: > Hi Loris, > > On 4/30/24 2:53 PM, Loris Bennett via slurm-users wrote: >> Hi Dietmar, >> Dietmar Rieder via slurm-users >> writes: >> >>> Hi, >>> >>> is it possible to have slur

[slurm-users] Re: scheduling according time requirements

2024-04-30 Thread Loris Bennett via slurm-users
'srun ... --pty bash', as far as I understand, the preferred method is to use 'salloc' as above, and to use 'srun' for starting MPI processes. Cheers, Loris > Thanks so much and sorry for the naive question >Dietmar -- Dr. Loris Bennett (Herr/Mr) FUB-IT (ex-ZEDAT), F

[slurm-users] Re: Avoiding fragmentation

2024-04-08 Thread Loris Bennett via slurm-users
t for us because we have a large number of single core jobs and almost all the users, whether doing MPI or not, significantly overestimate the memory requirements of their jobs. Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) FUB-IT (ex-ZEDAT), Freie Universität Berlin -- slurm-users mailin

[slurm-users] Re: Suggestions for Partition/QoS configuration

2024-04-04 Thread Loris Bennett via slurm-users
The downside is that very occasionally nodes may idle because a user has reached his or her cap. However, we have usually have enough uncapped users submitting jobs, so that in fact this happens only rarely, such as sometimes at Christmas or New Year. Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) FUB-IT (ex-ZEDAT), Freie Universität Berlin -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] job_submit.lua - uid in Docker cluster

2024-02-14 Thread Loris Bennett via slurm-users
the value 0.0 and is thus not an integer. The only user within the Docker cluster is 'root'. Has anyone come across this issue? Is it to do with the Docker environment or the difference in the OS versions (Lua 5.1.4 vs. 5.3.4, lua-posix 32 vs. 33.3.1)? Cheers, Loris -- Dr. Loris Bennett (Herr/Mr

[slurm-users] Re: Starting a job after a file is created in previous job (dependency looking for soluton)

2024-02-06 Thread Loris Bennett via slurm-users
an specify how many jobs should run simultaneously with the '%' notation: --array=1-200%2 Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) FUB-IT (ex-ZEDAT), Freie Universität Berlin -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: SLURM configuration for LDAP users

2024-02-05 Thread Loris Bennett via slurm-users
ng > LaunchParameters=enable_nss_slurm in the slurm.conf file and put slurm > keyword in passwd/group > entry in the /etc/nsswitch.conf file. Did these, but didn't help either. > > I am bereft of ideas at present. If anyone has real world experience and can > advise,

[slurm-users] Two jobs each with a different partition running on same node?

2024-01-29 Thread Loris Bennett
be in a single partition. Was this indeed the case and is it still the case with version Slurm 23.02.7? Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) FUB-IT (ex-ZEDAT), Freie Universität Berlin

Re: [slurm-users] slurm-config on NFS-volume

2024-01-24 Thread Loris Bennett
nfigless" Slurm: https://slurm.schedmd.com/configless_slurm.html Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) FUB-IT (ex-ZEDAT), Freie Universität Berlin

Re: [slurm-users] Multifactor fair-share with single account

2024-01-04 Thread Loris Bennett
Hi Kamil, Kamil Wilczek writes: > W dniu 4.01.2024 o 07:56, Loris Bennett pisze: >> Hi Kamil, >> Kamil Wilczek writes: >> >>> Dear All, >>> >>> I have a question regarding the fair-share factor of the multifactor >>> priority a

Re: [slurm-users] Multifactor fair-share with single account

2024-01-03 Thread Loris Bennett
nd thus treated equally by the fair-share mechanism. Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) FUB-IT (ex-ZEDAT), Freie Universität Berlin

Re: [slurm-users] Selecting only a subset of GPU's from all available GPU's

2023-12-17 Thread Loris Bennett
e much nicer if multiple GPUs types passed to '--gres' were ORed. Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] partition qos without managing users

2023-11-23 Thread Loris Bennett
of partition QoS so that the number >>> of >>> jobs or cpus is limited for a single user. >>> So far my testing always depends on creating users within the >>> accounting database however I'd like to avoid managing each user and >>> having to create or sync _all_ LDAP users also within Sturm. >>> Or - are there solutions to sync LDAP or AzureAD users to the Slurm >>> accounting database? >>> Thanks for any input. >>> Best - Eg. >>> >> > -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] REST-based CLI tools out there somewhere?

2023-11-10 Thread Loris Bennett
achments. The DRW Companies make no representations that this e-mail > or any attachments are free of computer viruses or other defects. -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] Sinfo options not working in SLURM 23.11

2023-10-30 Thread Loris Bennett
> > > > FPGA* up infinite 1 idle >FPGA01 > > Any pointers will help. Why do you think that the output above is wrong? Cheers, Loris > Regards, > > DJ > -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] Site factor plugin example?

2023-10-24 Thread Loris Bennett
Loris Bennett writes: > Christopher Samuel writes: > >> On 10/13/23 10:10, Angel de Vicente wrote: >> >>> But, in any case, I would still be interested in a site factor >>> plugin example, because I might revisit this in the future. >> >> I don't

Re: [slurm-users] Site factor plugin example?

2023-10-17 Thread Loris Bennett
reating a memory-wasted factor. Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] Site factor plugin example?

2023-10-16 Thread Loris Bennett
Hello Angel, Angel de Vicente writes: > Hello Loris, > > "Loris Bennett" writes: > >> Did you ever find an example or write your own plugin which you could >> provide as a example? > > I'm afraid not (though I didn't persevere, because for the momen

Re: [slurm-users] Site factor plugin example?

2023-10-13 Thread Loris Bennett
or plugin to start with. > > Do you know of any examples that can set me in the right direction? Did you ever find an example or write your own plugin which you could provide as a example? Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] Fairshare: Penalising unused memory rather than used memory?

2023-10-11 Thread Loris Bennett
eferred behaviour) and >> provides a clear motivation to change. Could be done with QOS unless >> you already use that in a conflicting way. >> Gareth >> Get Outlook for Android <https://aka.ms/ghei36> >> -

[slurm-users] Fairshare: Penalising unused memory rather than used memory?

2023-10-11 Thread Loris Bennett
be interested in knowing whether one can take into account the *requested but unused memory* when calculating usage. Is this possible? Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] Guarantee minimum amount of GPU resources to a Slurm account

2023-09-13 Thread Loris Bennett
rictions, such as a shorter maximum run-time. What are the pros and cons of the reservation approach compared with the above partition-based approach? Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] squeue: unrecognized option '--array-unique'

2023-09-12 Thread Loris Bennett
Loris Bennett writes: > Hi, > > Since upgrading to 23.02.5, I am seeing the following error > > $ squeue --array-unique > squeue: unrecognized option '--array-unique' > Try "squeue --help" for more information > > The help for 'squeue' implies it is

[slurm-users] squeue: unrecognized option '--array-unique'

2023-09-12 Thread Loris Bennett
--array-unique display one unique pending job array Is this a regression or is something else going on? Regards Loris Bennett -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] Submitting jobs from machines outside the cluster

2023-08-28 Thread Loris Bennett
upyter-lab. The users can then can point the browsers on their local machines to a local port and be connected to the session on the compute node. Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] Is there any public scientific-workflow example that can be run through Slurm?

2023-08-20 Thread Loris Bennett
is slightly problematic from our point of view, as it does not yet support job arrays. However, there is development activity going on to address this: https://github.com/nextflow-io/nextflow/issues/1477 Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] stopping job array after N failed jobs in row

2023-08-01 Thread Loris Bennett
tch-file. If they fail, usually it's bad and >> there is no >> sense to crunch the remaining thousands of job array jobs. >> >> OT: what is the correct terminology for one item in job array... sub-job? >> job-array-job? :) >> >> cheers >> >> josef >-- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] Distribute a single node resources across multiple partitons

2023-07-06 Thread Loris Bennett
remaining resources on the node are only available via partition A. A second job can only start on N in partition B if no jobs on N are running in partition A. Regards Loris Bennett -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] Slurm commands for launching tasks: salloc and sbatch

2023-07-05 Thread Loris Bennett
inutes and you will have to leave. With '--deadline' you can decide case by case. Cheers, Loris > Sent from my iPhone > >> On Jul 5, 2023, at 1:43 AM, Loris Bennett wrote: >> >> Mike Mikailov writes: >> >>> About the last point. In the case of s

Re: [slurm-users] Slurm commands for launching tasks: salloc and sbatch

2023-07-04 Thread Loris Bennett
[snip (36 lines)] > > Queuing system No Yes > > I am not sure what you mean with the last point, since 'salloc' is also > handled by the queueing system. If the resources requested are > currently not available, 'salloc' will wait until they are. [snip (42 lines)] -- Dr.

[slurm-users] Favouring job arrays over individual jobs?

2023-06-29 Thread Loris Bennett
with identical resource requirements :-( Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] Backfill Scheduling

2023-06-27 Thread Loris Bennett
Hi Reed, Reed Dier writes: > On Jun 27, 2023, at 1:10 AM, Loris Bennett > wrote: > > Hi Reed, > > Reed Dier writes: > > Is this an issue with the relative FIFO nature of the priority scheduling > currently with all of the other factors disabled, > or

Re: [slurm-users] Backfill Scheduling

2023-06-27 Thread Loris Bennett
ID and the way the input files are organised. We are currently not sure about the best way to do this in a suitably generic way. -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] spreading jobs out across the cluster

2023-06-14 Thread Loris Bennett
t is your use-case for wanting to spread the jobs out? Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] hi-priority partition and preemption

2023-05-24 Thread Loris Bennett
ties, but we don't use preemption. > Since i have jobs thath must run at specific time and must have priority over > all others, is this the correct way to do? For this I would probably use a recurring reservation. Cheers, Loris > Thanks > > FR -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

[slurm-users] Restrictions for new/inefficient users?

2023-05-24 Thread Loris Bennett
, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-27 Thread Loris Bennett
tools. You're talking about 1 of jobs on one >> hand yet you want fetch the status every 30 seconds? What is the >> point of that other then overloading the scheduler? >> >> We're telling your users not to query the slurm too often and usually >> give 5 minutes as a good interval. You have to let slurm do it's job. >> There is no point in querying in a loop every 30 seconds when we're >> talking about large numbers of jobs. >> >> >> Ward -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] snakemake and slurm in general - correction

2023-02-24 Thread Loris Bennett
Loris Bennett writes: > Hi David, > > (Thanks for changing the subject to something more appropriate). > > David Laehnemann writes: > >> Yes, but only to an extent. The linked conversation ends with this: >> >>>> Do you have any best practice about

Re: [slurm-users] snakemake and slurm in general

2023-02-23 Thread Loris Bennett
every sympathy for people working on Open Source projects and am very happy to offer assistance and have commented on lack of support for job arrays in Nextflow here: https://github.com/nextflow-io/nextflow/issues/1477 This is in fact where I learned about the potential negative impact of

Re: [slurm-users] snakemake and slurm in general

2023-02-23 Thread Loris Bennett
s are configurable, see this Wiki page: >> https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configuration/#maxjobcount-limit >> >> /Ole >> -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-23 Thread Loris Bennett
arrays, and so generates large numbers of jobs with identical resource requirements, which can prevent backfill from working properly. Skimming the documentation for Snakemake, I also could not find any reference to Slurm job arrays, so this could also be an issue. Just my slightly grumpy 2¢.

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-23 Thread Loris Bennett
> > 2) Slurm developers, whether `scontrol` is expected to be quicker from > its implementation and whether using `scontrol` would also be the > option that puts less strain on the scheduler in general? > > Many thanks and best regards, > David -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

[slurm-users] job_res_rm_job: plugin still initializing

2023-02-07 Thread Loris Bennett
the SchedMD employee writes I don't think this should ever happen. Has anyone else seen this issue? Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] Debian dist-upgrade?

2023-01-24 Thread Loris Bennett
rtual) machine running some appropriate RedHat-like OS, create the RPMs for the three versions of Slurm you need, install the first one, import your database and then do the updates. Finally you can dump the database again and import it on your Debian 11 system. That would still be a bit of a faff and so still may not be worth it. Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] Using oversubscribe to hammer a node

2023-01-19 Thread Loris Bennett
ng a quarter of a core on average. It is not clear that you will actually increase throughput this way. I would probably first turn on hyperthreading to deal with jobs which have intermittent CPU-usage. Still, since Slurm offers the possibility of oversubscription, I assume there must be a use-case.

Re: [slurm-users] srun --mem issue

2022-12-08 Thread Loris Bennett
Ryan Novosielski writes: >> On Dec 8, 2022, at 21:30, Kilian Cavalotti >> wrote: >> >> Hi Loris, >> >> On Thu, Dec 8, 2022 at 12:59 AM Loris Bennett >> wrote: >>> However, I do have a chronic problem with users requesting too much >>&

Re: [slurm-users] srun --mem issue

2022-12-08 Thread Loris Bennett
Ryan Novosielski writes: > On Dec 8, 2022, at 03:57, Loris Bennett wrote: > > Loris Bennett writes: > > Moshe Mergy writes: > > Hi Sandor > > I personnaly block "--mem=0" requests in file job_submit.lua (slurm 20.02): > > if (job_desc.min_mem_

Re: [slurm-users] srun --mem issue

2022-12-08 Thread Loris Bennett
n if --mem=0 is specified (I guess). Cheers, Loris > ------- > From: slurm-users on behalf of Loris > Bennett > Sent: Thursd

Re: [slurm-users] srun --mem issue

2022-12-08 Thread Loris Bennett
Loris Bennett writes: > Moshe Mergy writes: > >> Hi Sandor >> >> I personnaly block "--mem=0" requests in file job_submit.lua (slurm 20.02): >> >> if (job_desc.min_mem_per_node == 0 or job_desc.min_mem_per_cpu == 0) then >> s

Re: [slurm-users] srun --mem issue

2022-12-08 Thread Loris Bennett
b. How can I block a --mem=0 request? > > We are running: > > * OS: RHEL 7 > * cgroups version 1 > * slurm: 19.05 > > Thank you, > > Sandor Felho > > Sr Consultant, Data Science & Analytics > -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] GPU-node not waking up after power-save

2022-10-13 Thread Loris Bennett
non-GPUs, which do wake up properly. Thanks for confirming that there is no fundamental issue. Cheers, Loris > Best > > Ümit > > > > From: slurm-users on behalf of Loris > Bennett > Date: Thursday, 13. October 2022 at 08:14 > To: Slurm Users Mailing List >

[slurm-users] GPU-node not waking up after power-save

2022-10-13 Thread Loris Bennett
situation, I was wondering whether this a problem others have (had). So does power-saving work in general for GPU nodes and, if so, are there any extra steps one needs to take in order to set things up properly? Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin Email

Re: [slurm-users] Detecting non-MPI jobs running on multiple nodes

2022-09-29 Thread Loris Bennett
"Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)" writes: > On Sep 29, 2022, at 10:34 AM, Steffen Grunewald > wrote: > > Hi Noam, > > I'm wondering why one would want to know that - given that there are > approaches to multi-node operation beyond MPI (Charm++ comes to mind)? > > The

Re: [slurm-users] Detecting non-MPI jobs running on multiple nodes

2022-09-29 Thread Loris Bennett
Hi Ole, Ole Holm Nielsen writes: > Hi Loris, > > On 9/29/22 09:26, Loris Bennett wrote: >> Has anyone already come up with a good way to identify non-MPI jobs which >> request multiple cores but don't restrict themselves to a single node, >> leaving cores idle on a

Re: [slurm-users] Detecting non-MPI jobs running on multiple nodes

2022-09-29 Thread Loris Bennett
t does not help you much, but perhaps something to think about > > On Thu, Sep 29, 2022 at 1:29 AM Loris Bennett > wrote: >> >> Hi, >> >> Has anyone already come up with a good way to identify non-MPI jobs which >> request multiple cores but don't res

[slurm-users] Detecting non-MPI jobs running on multiple nodes

2022-09-29 Thread Loris Bennett
one core is actually being used. Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

Re: [slurm-users] What is the complete logic to calculate node number in job_submit.lua

2022-09-26 Thread Loris Bennett
Hi Ole, Ole Holm Nielsen writes: > Hi Loris, > > On 9/26/22 12:51, Loris Bennett wrote: >>> When designing restriction in job_submit.lua, I found there is no member in >>> job_desc struct can directly be used to determine the node number finally >>> alloca

Re: [slurm-users] What is the complete logic to calculate node number in job_submit.lua

2022-09-26 Thread Loris Bennett
writes: > Hi all: > > > > When designing restriction in job_submit.lua, I found there is no member in > job_desc struct can directly be used to determine the node number finally > allocated to a job. The job_desc.min_nodes seem to > be a close answer, but it will be 0xFFFE when user not

Re: [slurm-users] Providing users with info on wait time vs. run time

2022-09-16 Thread Loris Bennett
ousands of jobs. Once we get them to use job array, such problems generally disappear. Cheers, Loris > Regards, > Hermann > > On 9/16/22 9:09 AM, Loris Bennett wrote: >> Hi Hermann, >> Sebastian Potthoff writes: >> >>> Hi Hermann, >>> >>> I

Re: [slurm-users] Providing users with info on wait time vs. run time

2022-09-16 Thread Loris Bennett
e normal Epilog since we wanted to > avoid running slurm as root and I don’t have to worry > about ownership of the output file. Yes, good point. We should look into that. Cheers, Loris > Sebastian > > Am 16.09.2022 um 09:09 schrieb Loris Bennett : > > Hi Hermann, > &

Re: [slurm-users] Providing users with info on wait time vs. run time

2022-09-16 Thread Loris Bennett
Hi Hermann, Sebastian Potthoff writes: > Hi Hermann, > > I happened to read along this conversation and was just solving this issue > today. I added this part to the epilog script to make it work: > > # Add job report to stdout > StdOut=$(/usr/bin/scontrol show job=$SLURM_JOB_ID |

[slurm-users] Providing users with info on wait time vs. run time

2022-09-15 Thread Loris Bennett
the times over, say, a month and provide a the absolute totals and maybe a run-to-wait ratio. Has anyone already done anything like this? Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

Re: [slurm-users] slurmctld hanging

2022-07-29 Thread Loris Bennett
the priorities of this user's jobs are always higher than everyone else's? Cheers, Loris > On Fri, Jul 29, 2022 at 7:00 AM Loris Bennett > wrote: > > Hi Byron, > > byron writes: > > > Hi Loris - about a second > > What is the use-case for that? Are the

Re: [slurm-users] slurmctld hanging

2022-07-28 Thread Loris Bennett
out and that is the error you are seeing. Regards Loris > On Thu, Jul 28, 2022 at 2:47 PM Loris Bennett > wrote: > > Hi Byron, > > byron writes: > > > Hi > > > > We recently upgraded slurm from 19.05.7 to 20.11.9 and now we occasionally > (3 times

Re: [slurm-users] slurmctld hanging

2022-07-28 Thread Loris Bennett
lurmctld log. > > Can anyone suggest how to even start troubleshooting this? Without anything > in the logs I dont know where to start. > > Thanks Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

Re: [slurm-users] sreport time units explanation

2022-06-22 Thread Loris Bennett
pisze: >> Yes, it is possible, we have 63 GPUs. But I have a problem with >> the interpretation of this value. Specifically, I would like >> to know how it is calculated. I couldn't find it in the docs >> (or I'm just bad at searching :)). >> -- Dr. Loris Bennett (H

Re: [slurm-users] sreport time units explanation

2022-06-22 Thread Loris Bennett
may GPU cards do you have? We have 24 and our top user for the same time period is 4382(27.41%). This seems reasonable to me. As there are 513 hours in the period, your user would have had to have used around 15 cards fairly continuously. Is that not possible? Cheers, Loris > How shoul

Re: [slurm-users] multifactor priority calculation

2022-06-14 Thread Loris Bennett
>>>> >>>> PriorityType=priority/multifactor >>>> PriorityWeightJobSize=10 >>>> AccountingStorageTRES=cpu,mem,gres/gpu >>>> PriorityWeightTRES=cpu=1000,mem=2000,gres/gpu=3000 >>>> >>>> No age factor or somethi

Re: [slurm-users] Performance tracking of array tasks

2022-05-16 Thread Loris Bennett
o do the rest. Cheers, Loris > Thanks, > > William Dear > > ------ > From: slurm-users on behalf of Loris >

Re: [slurm-users] Performance tracking of array tasks

2022-05-16 Thread Loris Bennett
round a bunch of jobs. Each element of a job array still has its own job ID, so you can extract job data the same way you do for a non-array job. Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

Re: [slurm-users] non-historical scheduling

2022-04-12 Thread Loris Bennett
en the effect of the CPU-usage over the day. Cheers, Loris > > > -Original Message- > From: slurm-users On Behalf Of Loris > Bennett > Sent: Tuesday, April 12, 2022 12:06 PM > To: Slurm User Community List > Subject: Re: [slurm-users] non-historical scheduling > &

Re: [slurm-users] non-historical scheduling

2022-04-12 Thread Loris Bennett
u have received this email in error, kindly delete > it from your computer > system and notify us at the telephone number or email address appearing > above. The writer asserts in respect of this message and attachments all > rights for confidentiality, privilege or privacy to the fulle

Re: [slurm-users] sreport outputs invalid values due to corrupted data

2022-03-09 Thread Loris Bennett
s to the jobs table. Is there a way to fix the data ? Run scontrol show runawayjobs If any are found you should be offered the option of fixing them. Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

Re: [slurm-users] Add partition to existing user association

2022-01-24 Thread Loris Bennett
a >> >> However, I cannot set a partition: >> >> sacctmgr modify user thekla account=ops set partition=gpu >>  Unknown option: partition=gpu >>  Use keyword 'where' to modify condition >> >> This is not possible? >> >> The only solution I found to that is to delete the association and create it >> again with the partition: >> >> sacctmgr del user thekla account=ops >> >> sacctmgr add user thekla account=ops partition=gpu >> >> Thank you, >> >> Thekla >> >> > -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

Re: [slurm-users] Does setting 'job_desc.mail_user' in job_submit.lua work?

2022-01-10 Thread Loris Bennett
d already > an > enhancement (Bug 11591) but nothing happened so far... > > Regards, > > Alexander > > > Am 10.01.2022 um 11:14 schrieb Loris Bennett: >> Hi, >> >> Does setting 'mail_user' in job_submit.lua actually work in Slurm >> 21.08.5? >> &g

Re: [slurm-users] Does setting 'job_desc.mail_user' in job_submit.lua work?

2022-01-10 Thread Loris Bennett
). and _get_job_req_field() contains 'mail_user'. Cheers, Loris Marcus Boden writes: > Hi Loris, > > I can confirm the problem: I am not able to modify the job_desc.mail_user. > Other > values can be modified, though. > > We are also on 21.08.5 > > Best, > Marcus > &

[slurm-users] Does setting 'job_desc.mail_user' in job_submit.lua work?

2022-01-10 Thread Loris Bennett
of the plugin work, but they only read other elements of job_desc and do not modify anything. Am I doing something wrong? Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

Re: [slurm-users] Updated "pestat" tool for printing Slurm nodes status including GRES/GPU

2021-12-17 Thread Loris Bennett
Hi Diego, Diego Zuccato writes: > Hi Loris. > > Il 14/12/2021 14:16, Loris Bennett ha scritto: > >> spectrum, today, via our Zabbix monitoring, I spotted some jobs with an >> unusually high GPU-efficiencies which turned out to be doing >> cryptomining :-/ > W

[slurm-users] Nastygramming (was: Prevent users from updating their jobs)

2021-12-17 Thread Loris Bennett
some kind of framework to automate the actual sending of the nastygrams? 2. What metrics do you use for deciding whether a nastygram regarding resource usage needs to be sent? Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

Re: [slurm-users] Updated "pestat" tool for printing Slurm nodes status including GRES/GPU

2021-12-14 Thread Loris Bennett
ch. At the opposite end of the usage spectrum, today, via our Zabbix monitoring, I spotted some jobs with an unusually high GPU-efficiencies which turned out to be doing cryptomining :-/ Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

Re: [slurm-users] Updated "pestat" tool for printing Slurm nodes status including GRES/GPU

2021-12-13 Thread Loris Bennett
Hi Ole, The new version looks good to me. Cheers, Loris Ole Holm Nielsen writes: > Hi Loris, > > I fixed errors in the hostnamelength calculation and formatting. > Could you grab the latest pestat and test it? > > Thanks, > Ole > > On 12/13/21 13:56, Loris Bennett

Re: [slurm-users] Updated "pestat" tool for printing Slurm nodes status including GRES/GPU

2021-12-13 Thread Loris Bennett
ailhail gpu=1 g003 gpu mix 1 320.06*9520087622 gpu:gtx1080ti:2(S:0-1) 8692111 joesnow gpu=2 g004 gpu mix 6 321.74*9520065647 gpu:gtx1080ti:2(S:0-1) 8692124(8536946_562) gailhail gpu=1 8692122(8536946_561) gailhail gpu=1 It looks as if

Re: [slurm-users] slurmdbd full backup so the primary can be purged

2021-12-13 Thread Loris Bennett
like this. > > Are archive files readable by sacct and sreport, or easily manually > parseable? > > I am going to turn these on in my test cluster, but hearing about other > peoples experiences with this would probably be helpful. > -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

Re: [slurm-users] Job array start time and SchedNodes

2021-12-09 Thread Loris Bennett
cheduled on node cn04. > > When they start running they run on separate nodes: > >       131799_1   cpu  test.sh   thekla  R   0:02 1 cn01 >   131799_2   cpu  test.sh   thekla  R   0:02 1 cn02 >   131799_3   cpu  test.sh   thekla  R   0:02 1

Re: [slurm-users] smap on Centos7

2021-12-08 Thread Loris Bennett
be part of the slurm-gui Red Hat package ? > > Running: Slurm 20.02.4 in aws-pcluster 2.10.3(EOL Support 12/31/21) > OS: Centos 7.9 > > As a side note, I got sview installed with a bit of a work around. > https://github.com/zeekus/bash/blob/master/centos7_sview_install_on_aws_

Re: [slurm-users] Job array start time and SchedNodes

2021-12-07 Thread Loris Bennett
e node can all be started simultaneously on same node. Cheers, Loris > Regards, > > Thekla > > > On 7/12/21 12:16 μ.μ., Loris Bennett wrote: >> Hi Thekla, >> >> Thekla Loizou writes: >> >>> Dear all, >>> >>> I have notic

Re: [slurm-users] Job array start time and SchedNodes

2021-12-07 Thread Loris Bennett
l their requirements. The fact that all the jobs have cn06 as NODELIST however seems to suggest that you have either specified cn06 as the node the jobs should run on, or cn06 is the only node which fulfils the job requirements. I'm not sure what you mean about '"saving" the other nodes'. Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

Re: [slurm-users] Changing DefaultAccount for user

2021-11-23 Thread Loris Bennett
Bill Wichser writes: > I usually add "withassoc" for a show user > > sacctmgr show user loris withassoc Thanks! That was what I was looking for! Cheers, Loris > Bill > > On 11/23/21 9:07 AM, Loris Bennett wrote: >> sacctmgr show user loris accounts >

Re: [slurm-users] Changing DefaultAccount for user

2021-11-23 Thread Loris Bennett
$user,cluster=$cluster > > I've used the `sacctmgr -i update ...` step to change the default > account for users subsequently after accounts have been created. > > Hope that helps, sorry if not. > > Sean > > > On Tue, Nov 23, 2021 at 01:50:01PM +0100, Loris Bennett wrot

[slurm-users] Changing DefaultAccount for user

2021-11-23 Thread Loris Bennett
e modified subsequently) Deleting the user and recreating with the desired defaultaccount is the only way I have managed change the default account. Is this really the only way to achieve this? Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin Email loris.

Re: [slurm-users] sreport question when specifying partitions=

2021-11-10 Thread Loris Bennett
ight? If I specify a partition which does not exist, I get the behaviour you report. Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

Re: [slurm-users] How to get an estimate of job completion for planned maintenance?

2021-11-09 Thread Loris Bennett
s further in the future than our maximum run-time. There is then no need to drain anything. Short running jobs can still run right up to the reservation. Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

Re: [slurm-users] Parallel sbatch

2021-11-05 Thread Loris Bennett
nnebär detta att SLU behandlar dina > personuppgifter. För att läsa mer om hur detta går till, klicka här > <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/> > E-mailing SLU will result in SLU processing your personal data. For more > information on how this is done, click here > <https://www.slu.se/en/about-slu/contact-slu/personal-data/> -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

Re: [slurm-users] Secondary Unix group id of users not being issued in interactive srun command

2021-09-21 Thread Loris Bennett
n to fail, even though the groups still have the information via memberUID, but my experience was that it does indeed fail. Cheers, Loris -- Dr. Loris Bennett (Hr./Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

Re: [slurm-users] [External] How can I do to prevent a specific job from being prempted?

2021-09-14 Thread Loris Bennett
Dear Peter, 顏文 writes: > Dear Mr. Zillner > > I would like the specific running job not being rescheduled , but also can > not be terminated or cancelled in any way. If the job is cancelled, I need to > start it over again. Normally this kind of jobs require weeks to > finish. So the time

Re: [slurm-users] Calculate the GPU usages

2021-09-01 Thread Loris Bennett
cGRES%60,ncpus Cheers, Loris > On Wed, 1 Sep, 2021, 6:03 PM Loris Bennett, > wrote: > > Dear Jeherul, > > Jeherul Islam writes: > > > Dear Loris, > > > > Thanks for your reply. Here is the output for the same period but the > result is not match

Re: [slurm-users] Calculate the GPU usages

2021-09-01 Thread Loris Bennett
r Account Login Proper Name TRES Name Used > - --- - --- -- > chemistryj.mira j.mira gres/gpu 149434 > > On Wed, Sep 1, 2021 at 5:27 PM Loris Bennett > wrote: > > Dear Jeherul, > > Jeherul Islam writes: > > > Dear All, > &g

Re: [slurm-users] Calculate the GPU usages

2021-09-01 Thread Loris Bennett
021-08-01 > 4957060 > > Please share the correct way. > > With Thanks and regards so, without having checked your sacct/awk logic I would not expect the results to be the same. Cheers, Loris -- Dr. Loris Bennett (Hr./Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

  1   2   3   >