Re: [gridengine users] Minimum number of slots

2018-02-02 Thread Reuti
Hi,

> Am 02.02.2018 um 10:30 schrieb Ansgar Esztermann-Kirchner 
> :
> 
> On Thu, Feb 01, 2018 at 05:00:32PM +0100, Reuti wrote:
>> 
>>> Now, I think I can improve upon this choice by creating separate
>>> queues for different machines "sizes", i.e. an 8-core queue, a
>>> 20-core queue and so on.
>> 
>> So your intention is to have a bunch of queues and users select a queue 
>> instead of a dedicated PE (which would in turn select a machine from a 
>> dedicated set due to unique PEs per type of machine)?
> 
> Dedicated PEs would be another possibility, queues ware just the first
> thing that came to mind.
> With the current configuration, we only have one PE. It is set to
> $pe_slots.
> Users do not select a PE, but rather a slot range. The idea is that
> the scheduler selects an appropriate host.
> 
>> Somehow I don't get the advantage you want to achieve.
> 
> I want to prevent "small" jobs from running on large "nodes".

Aha, now I see the goal of it. We had a similar requirement regarding the 
amount of installed memory. Essentially my solution might be adapted to your 
case.

We have nodes with 64 GB of memory and some with 1 TB of it, all with 16 cores. 
Now the corner cases are:

- one large serial job is running on a 64 GB nodes and 15 cores are damed to 
idle
- 16 small jobs with a 1 GB request of virtual_free are running on the 1 TB 
nodes and most of the memory is unused

My setup used the amount of requested virtual_free to attach a soft or hard 
request for a certain type of machine in a JSV:

# virtual_free <= 4 GB: -hard smallmem=true
# 4 GB < virtual_free <= 8 GB: -soft smallmem=true
# 8 GB < virtual_free < 16 GB:
# 16 GB <= virtual_free < 32 GB: -soft bigmem=true
# 32 GB <= virtual_free: -hard bigmem=true

As one might guess, the 64 GB  nodes got the smallmem=true attached and the 1 
TB nodes bigmem=true, while both are not forced and so jobs requesting only a 
soft or none of these complexes at all can run on either machine.

===

You could reuse my script and select the type of machine depending on the 
number of requested cores – possibly introducing some "midmem" complex 
transferred to "midcpu" (or leave the machines with a medium amount of cores 
unspecified). I think attachments won't get through, let me know in case you 
would like to get the Perl script.

-- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Minimum number of slots

2018-02-02 Thread Ansgar Esztermann-Kirchner
On Thu, Feb 01, 2018 at 05:00:32PM +0100, Reuti wrote:
> 
> > Now, I think I can improve upon this choice by creating separate
> > queues for different machines "sizes", i.e. an 8-core queue, a
> > 20-core queue and so on.
> 
> So your intention is to have a bunch of queues and users select a queue 
> instead of a dedicated PE (which would in turn select a machine from a 
> dedicated set due to unique PEs per type of machine)?

Dedicated PEs would be another possibility, queues ware just the first
thing that came to mind.
With the current configuration, we only have one PE. It is set to
$pe_slots.
Users do not select a PE, but rather a slot range. The idea is that
the scheduler selects an appropriate host.
 
> Somehow I don't get the advantage you want to achieve.

I want to prevent "small" jobs from running on large "nodes".

A.

-- 
Ansgar Esztermann
Sysadmin
http://www.mpibpc.mpg.de/grubmueller/esztermann


smime.p7s
Description: S/MIME cryptographic signature
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Minimum number of slots

2018-02-01 Thread Ansgar Esztermann-Kirchner
On Thu, Feb 01, 2018 at 01:08:26PM +, Winkler, Ursula 
(ursula.wink...@uni-graz.at) wrote:
> Hi Ansger,
> 
> do you have experiences with Torque/PBS?

Yes, we've used it some years ago before switching to SGE.
 
> #$ -l nodes=2,ppn=12  (--> here: 12 slots on 2 nodes = 24 in sum) .
> 
> when you restrict "nodes=1" (with resource quotas "qconf -mrqs" for example) 
> then nobody should be able to use more than 1 node.

Is that a limitation per job or per user? I'd like to restrict jobs
only.

I guess I prefer William Hay's suggestion to use the JSV since it does
not require users to change their requests, but will keep the plugin
in mind in case the JSV approach does not work out.
Thank you very much!

A.
-- 
Ansgar Esztermann
Sysadmin
http://www.mpibpc.mpg.de/grubmueller/esztermann


smime.p7s
Description: S/MIME cryptographic signature
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Minimum number of slots

2018-02-01 Thread Reuti
Hi,

> Am 01.02.2018 um 11:44 schrieb Ansgar Esztermann-Kirchner 
> :
> 
> Hello List,
> 
> we're on 2011.11, and our general setup has nodes with a mixture of
> CPUs (e.g. 8, 20, 40 cores). Most of the nodes lack a high-speed
> interconnect, so we use a PE with allocation_rule $pe_slots, limiting
> jobs to just a single machine.

OK


> We're also using fairshare to achieve
> an even distribution of resources to users in the long term.
> There is a trade-off between fairshare and optimal resource usage when
> only low-priority users have 40-core jobs and a 40-core node becomes
> free. I know I can set my preferences by setting the relative weights
> for fairshare and urgency.

OK


> Now, I think I can improve upon this choice by creating separate
> queues for different machines "sizes", i.e. an 8-core queue, a
> 20-core queue and so on.

So your intention is to have a bunch of queues and users select a queue instead 
of a dedicated PE (which would in turn select a machine from a dedicated set 
due to unique PEs per type of machine)?

Somehow I don't get the advantage you want to achieve.

-- Reuti


> However, I do not see a (tractable) way to
> enforce proper job-queue association: allocation_rule 8 (etc) comes to
> mind, but I would lose the crucial one-host limit. This could be
> circumvented by creating one PE per node, but that would mean a huge
> administrative burden (and possible also a lot of extra load on the
> scheduler).
> 
> Anything I'm missing?
> Thanks a lot,
> 
> A.
> -- 
> Ansgar Esztermann
> Sysadmin
> http://www.mpibpc.mpg.de/grubmueller/esztermann
> ___
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users


___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Minimum number of slots

2018-02-01 Thread Ansgar Esztermann-Kirchner
On Thu, Feb 01, 2018 at 02:25:37PM +, William Hay wrote:
> 
> If I undertsand you correctly:
> Create a $pe_slots PE for each type of node and associate it with the
> appropriate nodes.  Have a jsv tweak the requested pe based on the number
> of slots requested.

I think that should work; alternatively, just keeping a single
$pe_slots PE and creating several queues (and assigning via JSV)
should also work.

Using a JSV did not occur to me -- thanks a lot!

A.

-- 
Ansgar Esztermann
Sysadmin
http://www.mpibpc.mpg.de/grubmueller/esztermann


signature.asc
Description: Digital signature
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Minimum number of slots

2018-02-01 Thread William Hay
On Thu, Feb 01, 2018 at 11:44:25AM +0100, Ansgar Esztermann-Kirchner wrote:
> Now, I think I can improve upon this choice by creating separate
> queues for different machines "sizes", i.e. an 8-core queue, a
> 20-core queue and so on. However, I do not see a (tractable) way to
> enforce proper job-queue association: allocation_rule 8 (etc) comes to
> mind, but I would lose the crucial one-host limit. This could be
> circumvented by creating one PE per node, but that would mean a huge
> administrative burden (and possible also a lot of extra load on the
> scheduler).

If I undertsand you correctly:
Create a $pe_slots PE for each type of node and associate it with the
appropriate nodes.  Have a jsv tweak the requested pe based on the number
of slots requested.

William


signature.asc
Description: PGP signature
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Minimum number of slots

2018-02-01 Thread Winkler, Ursula (ursula.wink...@uni-graz.at)

> #$ -l nodes=2,ppn=12  (--> here: 12 slots on 2 nodes = 24 in sum) .
>
> when you restrict "nodes=1" (with resource quotas "qconf -mrqs" for example) 
> then nobody > should be able to use more than 1 node.

sorry, correction: "to use more than 1 node PER JOB" of course.


___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Minimum number of slots

2018-02-01 Thread Winkler, Ursula (ursula.wink...@uni-graz.at)
Hi Ansger,

do you have experiences with Torque/PBS?

There is a useful gridengine plugin which works quite similar: 
https://github.com/brlindblom/gepetools

I have it in use on one of my clusters. The advantage: users must order not 
just slots but also nodes in the form:

#$ -l nodes=2,ppn=12  (--> here: 12 slots on 2 nodes = 24 in sum) .

when you restrict "nodes=1" (with resource quotas "qconf -mrqs" for example) 
then nobody should be able to use more than 1 node.

If you are interested and don't want to read all the gepetool instructions, I 
can send you my personal documentation.

Ursula


Von: users-boun...@gridengine.org  im Auftrag von 
Ansgar Esztermann-Kirchner 
Gesendet: Donnerstag, 01. Februar 2018 11:44
An: users@gridengine.org
Betreff: [gridengine users] Minimum number of slots

Hello List,

we're on 2011.11, and our general setup has nodes with a mixture of
CPUs (e.g. 8, 20, 40 cores). Most of the nodes lack a high-speed
interconnect, so we use a PE with allocation_rule $pe_slots, limiting
jobs to just a single machine. We're also using fairshare to achieve
an even distribution of resources to users in the long term.
There is a trade-off between fairshare and optimal resource usage when
only low-priority users have 40-core jobs and a 40-core node becomes
free. I know I can set my preferences by setting the relative weights
for fairshare and urgency.

Now, I think I can improve upon this choice by creating separate
queues for different machines "sizes", i.e. an 8-core queue, a
20-core queue and so on. However, I do not see a (tractable) way to
enforce proper job-queue association: allocation_rule 8 (etc) comes to
mind, but I would lose the crucial one-host limit. This could be
circumvented by creating one PE per node, but that would mean a huge
administrative burden (and possible also a lot of extra load on the
scheduler).

Anything I'm missing?
Thanks a lot,

A.
--
Ansgar Esztermann
Sysadmin
http://www.mpibpc.mpg.de/grubmueller/esztermann

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users