> Am 02.02.2018 um 10:30 schrieb Ansgar Esztermann-Kirchner
> On Thu, Feb 01, 2018 at 05:00:32PM +0100, Reuti wrote:
>>> Now, I think I can improve upon this choice by creating separate
>>> queues for different machines "sizes", i.e. an 8-core queue, a
>>> 20-core queue and so on.
>> So your intention is to have a bunch of queues and users select a queue
>> instead of a dedicated PE (which would in turn select a machine from a
>> dedicated set due to unique PEs per type of machine)?
> Dedicated PEs would be another possibility, queues ware just the first
> thing that came to mind.
> With the current configuration, we only have one PE. It is set to
> Users do not select a PE, but rather a slot range. The idea is that
> the scheduler selects an appropriate host.
>> Somehow I don't get the advantage you want to achieve.
> I want to prevent "small" jobs from running on large "nodes".
Aha, now I see the goal of it. We had a similar requirement regarding the
amount of installed memory. Essentially my solution might be adapted to your
We have nodes with 64 GB of memory and some with 1 TB of it, all with 16 cores.
Now the corner cases are:
- one large serial job is running on a 64 GB nodes and 15 cores are damed to
- 16 small jobs with a 1 GB request of virtual_free are running on the 1 TB
nodes and most of the memory is unused
My setup used the amount of requested virtual_free to attach a soft or hard
request for a certain type of machine in a JSV:
# virtual_free <= 4 GB: -hard smallmem=true
# 4 GB < virtual_free <= 8 GB: -soft smallmem=true
# 8 GB < virtual_free < 16 GB:
# 16 GB <= virtual_free < 32 GB: -soft bigmem=true
# 32 GB <= virtual_free: -hard bigmem=true
As one might guess, the 64 GB nodes got the smallmem=true attached and the 1
TB nodes bigmem=true, while both are not forced and so jobs requesting only a
soft or none of these complexes at all can run on either machine.
You could reuse my script and select the type of machine depending on the
number of requested cores – possibly introducing some "midmem" complex
transferred to "midcpu" (or leave the machines with a medium amount of cores
unspecified). I think attachments won't get through, let me know in case you
would like to get the Perl script.
users mailing list