Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs

2020-06-30 Thread navin srivastava
Hi Team,

I have differentiated the CPU node and GPU nodes into two different queues.

Now I have 20 Nodes having CPUS (20 cores)only but no GPU.
Another set of nodes having GPU+CPU.some nodes are with 2 GPU and 20 CPU
and some are with 8GPU and 48 CPU assigned to GPU queue

user facing issues when in GPU queue. the scenario is as below:

user submitting jobs with 4CPU+1GPU and also submitting jobs with 4CPU
only. So the situation arises when all the GPU is full and the job
submitted with GPU resources is waiting in queue but there is a large
amount of CPU available but the job which is only required CPU jobs are not
going through because the 4CPU+1GPU job has higher priority over CPU.

is there any mechanism that once all GPU is full in use it will allow the
CPU based job.

Regards
Navin.






On Mon, Jun 22, 2020 at 6:09 PM Diego Zuccato 
wrote:

> Il 16/06/20 16:23, Loris Bennett ha scritto:
>
> > Thanks for pointing this out - I hadn't been aware of this.  Is there
> > anywhere in the documentation where this is explicitly stated?
> I don't remember. Seems Michael's experience is different. Possibly some
> other setting influences that behaviour. Maybe different partition
> priorities?
> But on the small cluster I'm managing it's this way. I'm not an expert
> and I'd like to understand.
>
> --
> Diego Zuccato
> DIFA - Dip. di Fisica e Astronomia
> Servizi Informatici
> Alma Mater Studiorum - Università di Bologna
> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> tel.: +39 051 20 95786
>
>


Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs

2020-06-22 Thread Diego Zuccato
Il 16/06/20 16:23, Loris Bennett ha scritto:

> Thanks for pointing this out - I hadn't been aware of this.  Is there
> anywhere in the documentation where this is explicitly stated?
I don't remember. Seems Michael's experience is different. Possibly some
other setting influences that behaviour. Maybe different partition
priorities?
But on the small cluster I'm managing it's this way. I'm not an expert
and I'd like to understand.

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786



Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs

2020-06-16 Thread Loris Bennett
Diego Zuccato  writes:

> Il 16/06/20 09:39, Loris Bennett ha scritto:
>
>>> Maybe it's already known and obvious, but... Remember that a node can be
>>> allocated to only one partition.
>> Maybe I am misunderstanding you, but I think that this is not the case.
>> A node can be in multiple partitions.
>
> *Assigned* to multiple partitions: OK.
> But once slurm schedules jon in "partGPU" on that node, the whole node
> is unavailable for jobs in "partCPU", even if the GPU job is using only
> 1% of the resources.

Thanks for pointing this out - I hadn't been aware of this.  Is there
anywhere in the documentation where this is explicitly stated?

>>  We have nodes belonging to
>> individual research groups which are in both a separate partition just
>> for the group and in a 'scavenger' partition for everyone (but with
>> lower priority add maximum run-time).
>
> More or less our current config. Quite inefficient, at least for us: too
> many unuseable resources due to small jobs.

Our scavenger partition tends to be used mostly by a small number of
users each with a huge number of small, short jobs.  Thus, they tend to
fill nodes and not block resources for that long, but I probably need to
look at this a bit more carefully.

>>> So, if you have the mixed nodes in bot
>>> partitions and there's a GPU job running, a non-gpu job will find that
>>> node marked as busy because it's allocated to another partition.
>>> That's why we're drastically reducing the number of partitions we have
>>> and will avoid shared nodes.

>> Again I don't this is explanation.  If a job is running on a GPU node,
>> but not using all the CPUs, then a CPU-only job should be able to start
>> on that node, unless some form of exclusivity has been set up, such as
>> ExclusiveUser=YES for the partition.

> Nope. The whole node gets allocated to one partition at a time. So if
> the GPU job and the CPU one are in different partitions, it's expected
> that only one starts. The behaviour you're looking for is the one of
> QoS: define a single partition w/ multiple QoS and both jobs will run
> concurrently.
>
> If you think about it, that's the meaning of "partition" :)

Like I said, this is new to me, but personally I don't think that
linguistically speaking it is obvious.  If the actual membership of a
node to a partition changes over time and just depends on which jobs
happen to be running on it at a given moment, to my mind, that's not
much like the physical concept of partitioning a room or a city.

Cheers,

Loris

-- 
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de



Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs

2020-06-16 Thread Renfro, Michael
Not trying to argue unnecessarily, but what you describe is not a universal 
rule, regardless of QOS.

Our GPU nodes are members of 3 GPU-related partitions, 2 more resource-limited 
non-GPU partitions, and one of two larger-memory partitions. It’s set up this 
way to minimize idle resources (due to us not buying enough GPUs in those nodes 
to keep all the CPUs busy, plus our other nodes having limited numbers of DIMM 
slots for larger-memory jobs).

First terminal, results in a job running in the ‘any-interactive’ partition on 
gpunode002. We have a job submit plugin that automatically routes jobs to 
‘interactive’, ‘gpu-interactive’, or ‘any-interactive’ depending on the 
resources requested:

=

[renfro@login rosetta-job]$ type hpcshell
hpcshell is a function
hpcshell ()
{
srun --partition=interactive $@ --pty bash -i
}
[renfro@login rosetta-job]$ hpcshell
[renfro@gpunode002(job 751070) rosetta-job]$

=

Second terminal, simultaneous to first terminal, results in a job running in 
the ‘gpu-interactive’ partition on gpunode002:

=

[renfro@login ~]$ hpcshell --gres=gpu
[renfro@gpunode002(job 751071) ~]$ squeue -t R -u $USER
JOBID  PARTI   NAME   USER ST TIME S:C: NODES MIN_MEMORY 
NODELIST(REASON) SUBMIT_TIME  START_TIMEEND_TIME 
TRES_PER_NODE
751071 gpu-i   bash renfro  R 0:08 *:*: 1 2000M  
gpunode002   2020-06-16T08:27:50 2020-06-16T08:27:50 2020-06-16T10:27:50 gpu
751070 any-i   bash renfro  R 0:18 *:*: 1 2000M  
gpunode002   2020-06-16T08:27:40 2020-06-16T08:27:40 2020-06-16T10:27:41 N/A
[renfro@gpunode002(job 751071) ~]$

=

Selected configuration details (excluding things like resource ranges and 
defaults):

NodeName=gpunode[001-003]  CoresPerSocket=14 RealMemory=382000 Sockets=2 
ThreadsPerCore=1 Weight=10011 Gres=gpu:2
NodeName=gpunode004  CoresPerSocket=14 RealMemory=894000 Sockets=2 
ThreadsPerCore=1 Weight=10021 Gres=gpu:2

PartitionName=gpu Default=NO MaxCPUsPerNode=16 ExclusiveUser=NO State=UP 
Nodes=gpunode[001-004]
PartitionName=gpu-debug Default=NO MaxCPUsPerNode=16 ExclusiveUser=NO State=UP 
Nodes=gpunode[001-004]
PartitionName=gpu-interactive Default=NO MaxCPUsPerNode=16 ExclusiveUser=NO 
State=UP Nodes=gpunode[001-004]
PartitionName=any-interactive Default=NO MaxCPUsPerNode=12 ExclusiveUser=NO 
State=UP Nodes=node[001-040],gpunode[001-004]
PartitionName=any-debug Default=NO MaxCPUsPerNode=12 ExclusiveUser=NO State=UP 
Nodes=node[001-040],gpunode[001-004]
PartitionName=bigmem Default=NO MaxCPUsPerNode=12 ExclusiveUser=NO State=UP 
Nodes=gpunode[001-003]
PartitionName=hugemem Default=NO MaxCPUsPerNode=12 ExclusiveUser=NO State=UP 
Nodes=gpunode004

> On Jun 16, 2020, at 8:14 AM, Diego Zuccato  wrote:
> 
> Il 16/06/20 09:39, Loris Bennett ha scritto:
> 
>>> Maybe it's already known and obvious, but... Remember that a node can be
>>> allocated to only one partition.
>> Maybe I am misunderstanding you, but I think that this is not the case.
>> A node can be in multiple partitions.
> *Assigned* to multiple partitions: OK.
> But once slurm schedules jon in "partGPU" on that node, the whole node
> is unavailable for jobs in "partCPU", even if the GPU job is using only
> 1% of the resources.
> 
>> We have nodes belonging to
>> individual research groups which are in both a separate partition just
>> for the group and in a 'scavenger' partition for everyone (but with
>> lower priority add maximum run-time).
> More or less our current config. Quite inefficient, at least for us: too
> many unuseable resources due to small jobs.
> 
>>> So, if you have the mixed nodes in bot
>>> partitions and there's a GPU job running, a non-gpu job will find that
>>> node marked as busy because it's allocated to another partition.
>>> That's why we're drastically reducing the number of partitions we have
>>> and will avoid shared nodes.
>> Again I don't this is explanation.  If a job is running on a GPU node,
>> but not using all the CPUs, then a CPU-only job should be able to start
>> on that node, unless some form of exclusivity has been set up, such as
>> ExclusiveUser=YES for the partition.
> Nope. The whole node gets allocated to one partition at a time. So if
> the GPU job and the CPU one are in different partitions, it's expected
> that only one starts. The behaviour you're looking for is the one of
> QoS: define a single partition w/ multiple QoS and both jobs will run
> concurrently.
> 
> If you think about it, that's the meaning of "partition" :)
> 
> --
> Diego Zuccato
> DIFA - Dip. di Fisica e Astronomia
> Servizi Informatici
> Alma Mater Studiorum - Università di Bologna
> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> tel.: +39 051 20 95786
> 



Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs

2020-06-16 Thread Diego Zuccato
Il 16/06/20 09:39, Loris Bennett ha scritto:

>> Maybe it's already known and obvious, but... Remember that a node can be
>> allocated to only one partition.
> Maybe I am misunderstanding you, but I think that this is not the case.
> A node can be in multiple partitions.
*Assigned* to multiple partitions: OK.
But once slurm schedules jon in "partGPU" on that node, the whole node
is unavailable for jobs in "partCPU", even if the GPU job is using only
1% of the resources.

>  We have nodes belonging to
> individual research groups which are in both a separate partition just
> for the group and in a 'scavenger' partition for everyone (but with
> lower priority add maximum run-time).
More or less our current config. Quite inefficient, at least for us: too
many unuseable resources due to small jobs.

>> So, if you have the mixed nodes in bot
>> partitions and there's a GPU job running, a non-gpu job will find that
>> node marked as busy because it's allocated to another partition.
>> That's why we're drastically reducing the number of partitions we have
>> and will avoid shared nodes.
> Again I don't this is explanation.  If a job is running on a GPU node,
> but not using all the CPUs, then a CPU-only job should be able to start
> on that node, unless some form of exclusivity has been set up, such as
> ExclusiveUser=YES for the partition.
Nope. The whole node gets allocated to one partition at a time. So if
the GPU job and the CPU one are in different partitions, it's expected
that only one starts. The behaviour you're looking for is the one of
QoS: define a single partition w/ multiple QoS and both jobs will run
concurrently.

If you think about it, that's the meaning of "partition" :)

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786



Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs

2020-06-16 Thread Loris Bennett
Diego Zuccato  writes:

> Il 13/06/20 17:47, navin srivastava ha scritto:
>
>> Yes we have separate partitions. Some are specific to gpu having 2 nodes
>> with 8 gpu and another partitions are mix of both,nodes with 2 gpu and
>> very few nodes are without any gpu. 
> Maybe it's already known and obvious, but... Remember that a node can be
> allocated to only one partition.

Maybe I am misunderstanding you, but I think that this is not the case.
A node can be in multiple partitions.  We have nodes belonging to
individual research groups which are in both a separate partition just
for the group and in a 'scavenger' partition for everyone (but with
lower priority add maximum run-time).

> So, if you have the mixed nodes in bot
> partitions and there's a GPU job running, a non-gpu job will find that
> node marked as busy because it's allocated to another partition.
> That's why we're drastically reducing the number of partitions we have
> and will avoid shared nodes.

Again I don't this is explanation.  If a job is running on a GPU node,
but not using all the CPUs, then a CPU-only job should be able to start
on that node, unless some form of exclusivity has been set up, such as
ExclusiveUser=YES for the partition.

Without seeing the full slurm.conf, it is difficult to guess what the
problem might be.

Cheers,

Loris

-- 
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de



Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs

2020-06-15 Thread Diego Zuccato
Il 13/06/20 17:47, navin srivastava ha scritto:

> Yes we have separate partitions. Some are specific to gpu having 2 nodes
> with 8 gpu and another partitions are mix of both,nodes with 2 gpu and
> very few nodes are without any gpu. 
Maybe it's already known and obvious, but... Remember that a node can be
allocated to only one partition. So, if you have the mixed nodes in bot
partitions and there's a GPU job running, a non-gpu job will find that
node marked as busy because it's allocated to another partition.
That's why we're drastically reducing the number of partitions we have
and will avoid shared nodes.

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786



Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs

2020-06-15 Thread navin srivastava
Thanks Renfro.

I will perform similar setting and let us see how it goes.

Regards

On Mon, Jun 15, 2020, 23:02 Renfro, Michael  wrote:

> So if a GPU job is submitted to a partition containing only GPU nodes, and
> a non-GPU job is submitted to a partition containing at least some nodes
> without GPUs, both jobs should be able to run. Priorities should be
> evaluated on a per-partition basis. I can 100% guarantee that in our HPC,
> pending GPU jobs don't block non-GPU jobs, and vice versa.
>
> I could see a problem if the GPU job was submitted to a partition
> containing both types of nodes: if that job was assigned the highest
> priority for whatever reason (fair share, age, etc.), other jobs in the
> same partition would have to wait until that job started.
>
> A simple solution would be to make a GPU partition containing only GPU
> nodes, and a non-GPU partition containing only non-GPU nodes. Submit GPU
> jobs to the GPU partition, and non-GPU jobs to the non-GPU partition.
>
> Once that works, you could make a partition that includes both types of
> nodes to reduce idle resources, but jobs submitted to that partition would
> have to (a) not require a GPU, (b) require a limited number of CPUs per
> node, so that you'd have some CPUs available for GPU jobs on the nodes
> containing GPUs.
>
> --
> *From:* slurm-users  on behalf of
> navin srivastava 
> *Sent:* Saturday, June 13, 2020 10:47 AM
> *To:* Slurm User Community List 
> *Subject:* Re: [slurm-users] ignore gpu resources to scheduled the cpu
> based jobs
>
>
> Yes we have separate partitions. Some are specific to gpu having 2 nodes
> with 8 gpu and another partitions are mix of both,nodes with 2 gpu and very
> few nodes are without any gpu.
>
> Regards
> Navin
>
>
> On Sat, Jun 13, 2020, 21:11 navin srivastava 
> wrote:
>
> Thanks Renfro.
>
> Yes we have both types of nodes with gpu and nongpu.
> Also some users job require gpu and some applications use only CPU.
>
> So the issue happens when user priority is high and waiting for gpu
> resources which is not available and the job with lower priority is waiting
> even though enough CPU is available which need only CPU resources.
>
> When I hold gpu  jobs the cpu  jobs will go through.
>
> Regards
> Navin
>
> On Sat, Jun 13, 2020, 20:37 Renfro, Michael  wrote:
>
> Will probably need more information to find a solution.
>
> To start, do you have separate partitions for GPU and non-GPU jobs? Do you
> have nodes without GPUs?
>
> On Jun 13, 2020, at 12:28 AM, navin srivastava 
> wrote:
>
> Hi All,
>
> In our environment we have GPU. so what i found is if the user having high
> priority and his job is in queue and waiting for the GPU resources which
> are almost full and not available. so the other user submitted the job
> which does not require the GPU resources are in queue even though lots of
> cpu resources are available.
>
> our scheduling mechanism is FIFO and Fair tree enabled. Is there any way
> we can make some changes so that the cpu based job should go through and
> GPU based job can wait till the GPU resources are free.
>
> Regards
> Navin.
>
>
>
>
>


Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs

2020-06-15 Thread Renfro, Michael
So if a GPU job is submitted to a partition containing only GPU nodes, and a 
non-GPU job is submitted to a partition containing at least some nodes without 
GPUs, both jobs should be able to run. Priorities should be evaluated on a 
per-partition basis. I can 100% guarantee that in our HPC, pending GPU jobs 
don't block non-GPU jobs, and vice versa.

I could see a problem if the GPU job was submitted to a partition containing 
both types of nodes: if that job was assigned the highest priority for whatever 
reason (fair share, age, etc.), other jobs in the same partition would have to 
wait until that job started.

A simple solution would be to make a GPU partition containing only GPU nodes, 
and a non-GPU partition containing only non-GPU nodes. Submit GPU jobs to the 
GPU partition, and non-GPU jobs to the non-GPU partition.

Once that works, you could make a partition that includes both types of nodes 
to reduce idle resources, but jobs submitted to that partition would have to 
(a) not require a GPU, (b) require a limited number of CPUs per node, so that 
you'd have some CPUs available for GPU jobs on the nodes containing GPUs.


From: slurm-users  on behalf of navin 
srivastava 
Sent: Saturday, June 13, 2020 10:47 AM
To: Slurm User Community List 
Subject: Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs


Yes we have separate partitions. Some are specific to gpu having 2 nodes with 8 
gpu and another partitions are mix of both,nodes with 2 gpu and very few nodes 
are without any gpu.

Regards
Navin


On Sat, Jun 13, 2020, 21:11 navin srivastava 
mailto:navin.alt...@gmail.com>> wrote:
Thanks Renfro.

Yes we have both types of nodes with gpu and nongpu.
Also some users job require gpu and some applications use only CPU.

So the issue happens when user priority is high and waiting for gpu resources 
which is not available and the job with lower priority is waiting even though 
enough CPU is available which need only CPU resources.

When I hold gpu  jobs the cpu  jobs will go through.

Regards
Navin

On Sat, Jun 13, 2020, 20:37 Renfro, Michael 
mailto:ren...@tntech.edu>> wrote:
Will probably need more information to find a solution.

To start, do you have separate partitions for GPU and non-GPU jobs? Do you have 
nodes without GPUs?

On Jun 13, 2020, at 12:28 AM, navin srivastava 
mailto:navin.alt...@gmail.com>> wrote:

Hi All,

In our environment we have GPU. so what i found is if the user having high 
priority and his job is in queue and waiting for the GPU resources which are 
almost full and not available. so the other user submitted the job which does 
not require the GPU resources are in queue even though lots of cpu resources 
are available.

our scheduling mechanism is FIFO and Fair tree enabled. Is there any way we can 
make some changes so that the cpu based job should go through and GPU based job 
can wait till the GPU resources are free.

Regards
Navin.






Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs

2020-06-13 Thread navin srivastava
Yes we have separate partitions. Some are specific to gpu having 2 nodes
with 8 gpu and another partitions are mix of both,nodes with 2 gpu and very
few nodes are without any gpu.

Regards
Navin


On Sat, Jun 13, 2020, 21:11 navin srivastava  wrote:

> Thanks Renfro.
>
> Yes we have both types of nodes with gpu and nongpu.
> Also some users job require gpu and some applications use only CPU.
>
> So the issue happens when user priority is high and waiting for gpu
> resources which is not available and the job with lower priority is waiting
> even though enough CPU is available which need only CPU resources.
>
> When I hold gpu  jobs the cpu  jobs will go through.
>
> Regards
> Navin
>
> On Sat, Jun 13, 2020, 20:37 Renfro, Michael  wrote:
>
>> Will probably need more information to find a solution.
>>
>> To start, do you have separate partitions for GPU and non-GPU jobs? Do
>> you have nodes without GPUs?
>>
>> On Jun 13, 2020, at 12:28 AM, navin srivastava 
>> wrote:
>>
>> Hi All,
>>
>> In our environment we have GPU. so what i found is if the user having
>> high priority and his job is in queue and waiting for the GPU resources
>> which are almost full and not available. so the other user submitted the
>> job which does not require the GPU resources are in queue even though lots
>> of cpu resources are available.
>>
>> our scheduling mechanism is FIFO and Fair tree enabled. Is there any way
>> we can make some changes so that the cpu based job should go through and
>> GPU based job can wait till the GPU resources are free.
>>
>> Regards
>> Navin.
>>
>>
>>
>>
>>


Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs

2020-06-13 Thread navin srivastava
Thanks Renfro.

Yes we have both types of nodes with gpu and nongpu.
Also some users job require gpu and some applications use only CPU.

So the issue happens when user priority is high and waiting for gpu
resources which is not available and the job with lower priority is waiting
even though enough CPU is available which need only CPU resources.

When I hold gpu  jobs the cpu  jobs will go through.

Regards
Navin

On Sat, Jun 13, 2020, 20:37 Renfro, Michael  wrote:

> Will probably need more information to find a solution.
>
> To start, do you have separate partitions for GPU and non-GPU jobs? Do you
> have nodes without GPUs?
>
> On Jun 13, 2020, at 12:28 AM, navin srivastava 
> wrote:
>
> Hi All,
>
> In our environment we have GPU. so what i found is if the user having high
> priority and his job is in queue and waiting for the GPU resources which
> are almost full and not available. so the other user submitted the job
> which does not require the GPU resources are in queue even though lots of
> cpu resources are available.
>
> our scheduling mechanism is FIFO and Fair tree enabled. Is there any way
> we can make some changes so that the cpu based job should go through and
> GPU based job can wait till the GPU resources are free.
>
> Regards
> Navin.
>
>
>
>
>


Re: [slurm-users] ignore gpu resources to scheduled the cpu based jobs

2020-06-13 Thread Renfro, Michael
Will probably need more information to find a solution.

To start, do you have separate partitions for GPU and non-GPU jobs? Do you have 
nodes without GPUs?

On Jun 13, 2020, at 12:28 AM, navin srivastava  wrote:

Hi All,

In our environment we have GPU. so what i found is if the user having high 
priority and his job is in queue and waiting for the GPU resources which are 
almost full and not available. so the other user submitted the job which does 
not require the GPU resources are in queue even though lots of cpu resources 
are available.

our scheduling mechanism is FIFO and Fair tree enabled. Is there any way we can 
make some changes so that the cpu based job should go through and GPU based job 
can wait till the GPU resources are free.

Regards
Navin.