Re: [slurm-users] squeue reports ReqNodeNotAvail but node is available

2020-07-10 Thread mercan
Hi Janna; It sounds like a Arp cache table problem to me. If your slurm head node can reachable ~1000 or more network devices (all connected network cards, switches etc., even they are reachable by different ports of the server), you need to increse some network settings at headnode and

Re: [slurm-users] squeue reports ReqNodeNotAvail but node is available

2020-07-10 Thread Chris Samuel
On Friday, 10 July 2020 3:34:44 PM PDT Janna Ore Nugent wrote: > I’ve got an intermittent situation with gpu nodes that sinfo says are > available and idle, but squeue reports as “ReqNodeNotAvail”. We’ve cycled > the nodes to restart services but it hasn’t helped. Any suggestions for >

[slurm-users] squeue reports ReqNodeNotAvail but node is available

2020-07-10 Thread Janna Ore Nugent
Hi All, I’ve got an intermittent situation with gpu nodes that sinfo says are available and idle, but squeue reports as “ReqNodeNotAvail”. We’ve cycled the nodes to restart services but it hasn’t helped. Any suggestions for resolving this or digging into it more deeply? Thanks, Janna Janna

Re: [slurm-users] How to queue jobs based on non-existent features

2020-07-10 Thread Alex Chekholko
Hey Raj, To me this all sounds, at a high level, a job for some kind of lightweight middleware on top of SLURM. E.g. makefiles or something like that. Where each pipeline would be managed outside of slurm and would maybe submit a job to install some software, then submit a job to run something

Re: [slurm-users] How to queue jobs based on non-existent features

2020-07-10 Thread Raj Sahae
Hi Paddy, Yes, this is a CI/CD pipeline. We currently use Jenkins pipelines but it has some significant drawbacks that Slurm solves out of the box that make it an attractive alternative. You noted some of them already, like good real time queue management, pre-emption, node weighting, high

Re: [slurm-users] [EXT] Weird issues with slurm's Priority

2020-07-10 Thread zaxs84
Thank you very much Sean! Your proposed solution solved the problem. I reckon it's not very efficient, but works for us. M.

Re: [slurm-users] How to queue jobs based on non-existent features

2020-07-10 Thread Paddy Doyle
Hi Raj, It sounds like you might be coming from a CI/CD pipeline setup, but just in case you're not, would you consider something like Jenkins or Gitlab CI instead of Slurm? The users could create multi-stage pipelines, with the 'build' stage installing the required software version, and then

Re: [slurm-users] How to queue jobs based on non-existent features

2020-07-10 Thread Raj Sahae
Interesting, I had not read the Licenses feature docs but I will look through that, thanks. Raj Sahae | m. +1 (408) 230-8531 From: slurm-users on behalf of Paul Edmon Reply-To: Slurm User Community List Date: Friday, July 10, 2020 at 10:09 AM To: "slurm-users@lists.schedmd.com" Subject:

Re: [slurm-users] How to queue jobs based on non-existent features

2020-07-10 Thread Paul Edmon
Another option would be to use the license feature and just set licenses to 0 when they aren't available. -Paul Edmon- On 7/10/2020 12:42 PM, Raj Sahae wrote: Hi Brian and Paul, You both sent me suggestions about using an offline dummy node with all features set. Thanks for your ideas but

Re: [slurm-users] How to queue jobs based on non-existent features

2020-07-10 Thread Raj Sahae
Hi Brian and Paul, You both sent me suggestions about using an offline dummy node with all features set. Thanks for your ideas but this won’t work for me as it’s not practical. We want to allow users to queue for all supported software versions and that easily numbers in the thousands or tens

Re: [slurm-users] changes in slurm.

2020-07-10 Thread navin srivastava
Thanks either I can use which slurmd -C gives because I see same set of node giving different value.or I can also choose the available memory I mean 251*1024 Regards Navin On Fri, Jul 10, 2020, 20:34 Stephan Roth wrote: > It's recommended to round RealMemory down to the next lower gigabyte

Re: [slurm-users] changes in slurm.

2020-07-10 Thread Stephan Roth
It's recommended to round RealMemory down to the next lower gigabyte value to prevent nodes from entering a drain state after rebooting with a bios- or kernel-update. Source: https://slurm.schedmd.com/SLUG17/FieldNotes.pdf, "Node configuration" Stephan On 10.07.20 13:46, Sarlo, Jeffrey S

Re: [slurm-users] How to queue jobs based on non-existent features

2020-07-10 Thread Paul Edmon
You could set up an dummy node that has the features that are not active but not allow jobs to schedule to that node by setting it to DOWN.  That would be a hacky way of accomplishing this. -Paul Edmon- On 7/9/2020 7:15 PM, Raj Sahae wrote: Hi all, My apologies if this is sent twice. The

Re: [slurm-users] changes in slurm.

2020-07-10 Thread Sarlo, Jeffrey S
If you run slurmd -C on the compute node, it should tell you what slurm thinks the RealMemory number is. Jeff From: slurm-users on behalf of navin srivastava Sent: Friday, July 10, 2020 6:24 AM To: Slurm User Community List Subject: Re: [slurm-users]

Re: [slurm-users] changes in slurm.

2020-07-10 Thread navin srivastava
Thank you for the answers. is the RealMemory will be decided on the Total Memory value or total usable memory value. i mean if a node having 256GB RAM but free -g will tell about only 251 GB. deda1x1591:~ # free -g total used free sharedbuffers cached Mem: