Re: [gridengine users] Minimum number of slots
Hi, > Am 02.02.2018 um 10:30 schrieb Ansgar Esztermann-Kirchner >: > > On Thu, Feb 01, 2018 at 05:00:32PM +0100, Reuti wrote: >> >>> Now, I think I can improve upon this choice by creating separate >>> queues for different machines "sizes", i.e. an 8-core queue, a >>> 20-core queue and so on. >> >> So your intention is to have a bunch of queues and users select a queue >> instead of a dedicated PE (which would in turn select a machine from a >> dedicated set due to unique PEs per type of machine)? > > Dedicated PEs would be another possibility, queues ware just the first > thing that came to mind. > With the current configuration, we only have one PE. It is set to > $pe_slots. > Users do not select a PE, but rather a slot range. The idea is that > the scheduler selects an appropriate host. > >> Somehow I don't get the advantage you want to achieve. > > I want to prevent "small" jobs from running on large "nodes". Aha, now I see the goal of it. We had a similar requirement regarding the amount of installed memory. Essentially my solution might be adapted to your case. We have nodes with 64 GB of memory and some with 1 TB of it, all with 16 cores. Now the corner cases are: - one large serial job is running on a 64 GB nodes and 15 cores are damed to idle - 16 small jobs with a 1 GB request of virtual_free are running on the 1 TB nodes and most of the memory is unused My setup used the amount of requested virtual_free to attach a soft or hard request for a certain type of machine in a JSV: # virtual_free <= 4 GB: -hard smallmem=true # 4 GB < virtual_free <= 8 GB: -soft smallmem=true # 8 GB < virtual_free < 16 GB: # 16 GB <= virtual_free < 32 GB: -soft bigmem=true # 32 GB <= virtual_free: -hard bigmem=true As one might guess, the 64 GB nodes got the smallmem=true attached and the 1 TB nodes bigmem=true, while both are not forced and so jobs requesting only a soft or none of these complexes at all can run on either machine. === You could reuse my script and select the type of machine depending on the number of requested cores – possibly introducing some "midmem" complex transferred to "midcpu" (or leave the machines with a medium amount of cores unspecified). I think attachments won't get through, let me know in case you would like to get the Perl script. -- Reuti ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Minimum number of slots
On Thu, Feb 01, 2018 at 05:00:32PM +0100, Reuti wrote: > > > Now, I think I can improve upon this choice by creating separate > > queues for different machines "sizes", i.e. an 8-core queue, a > > 20-core queue and so on. > > So your intention is to have a bunch of queues and users select a queue > instead of a dedicated PE (which would in turn select a machine from a > dedicated set due to unique PEs per type of machine)? Dedicated PEs would be another possibility, queues ware just the first thing that came to mind. With the current configuration, we only have one PE. It is set to $pe_slots. Users do not select a PE, but rather a slot range. The idea is that the scheduler selects an appropriate host. > Somehow I don't get the advantage you want to achieve. I want to prevent "small" jobs from running on large "nodes". A. -- Ansgar Esztermann Sysadmin http://www.mpibpc.mpg.de/grubmueller/esztermann smime.p7s Description: S/MIME cryptographic signature ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Minimum number of slots
On Thu, Feb 01, 2018 at 01:08:26PM +, Winkler, Ursula (ursula.wink...@uni-graz.at) wrote: > Hi Ansger, > > do you have experiences with Torque/PBS? Yes, we've used it some years ago before switching to SGE. > #$ -l nodes=2,ppn=12 (--> here: 12 slots on 2 nodes = 24 in sum) . > > when you restrict "nodes=1" (with resource quotas "qconf -mrqs" for example) > then nobody should be able to use more than 1 node. Is that a limitation per job or per user? I'd like to restrict jobs only. I guess I prefer William Hay's suggestion to use the JSV since it does not require users to change their requests, but will keep the plugin in mind in case the JSV approach does not work out. Thank you very much! A. -- Ansgar Esztermann Sysadmin http://www.mpibpc.mpg.de/grubmueller/esztermann smime.p7s Description: S/MIME cryptographic signature ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Minimum number of slots
Hi, > Am 01.02.2018 um 11:44 schrieb Ansgar Esztermann-Kirchner >: > > Hello List, > > we're on 2011.11, and our general setup has nodes with a mixture of > CPUs (e.g. 8, 20, 40 cores). Most of the nodes lack a high-speed > interconnect, so we use a PE with allocation_rule $pe_slots, limiting > jobs to just a single machine. OK > We're also using fairshare to achieve > an even distribution of resources to users in the long term. > There is a trade-off between fairshare and optimal resource usage when > only low-priority users have 40-core jobs and a 40-core node becomes > free. I know I can set my preferences by setting the relative weights > for fairshare and urgency. OK > Now, I think I can improve upon this choice by creating separate > queues for different machines "sizes", i.e. an 8-core queue, a > 20-core queue and so on. So your intention is to have a bunch of queues and users select a queue instead of a dedicated PE (which would in turn select a machine from a dedicated set due to unique PEs per type of machine)? Somehow I don't get the advantage you want to achieve. -- Reuti > However, I do not see a (tractable) way to > enforce proper job-queue association: allocation_rule 8 (etc) comes to > mind, but I would lose the crucial one-host limit. This could be > circumvented by creating one PE per node, but that would mean a huge > administrative burden (and possible also a lot of extra load on the > scheduler). > > Anything I'm missing? > Thanks a lot, > > A. > -- > Ansgar Esztermann > Sysadmin > http://www.mpibpc.mpg.de/grubmueller/esztermann > ___ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Minimum number of slots
On Thu, Feb 01, 2018 at 02:25:37PM +, William Hay wrote: > > If I undertsand you correctly: > Create a $pe_slots PE for each type of node and associate it with the > appropriate nodes. Have a jsv tweak the requested pe based on the number > of slots requested. I think that should work; alternatively, just keeping a single $pe_slots PE and creating several queues (and assigning via JSV) should also work. Using a JSV did not occur to me -- thanks a lot! A. -- Ansgar Esztermann Sysadmin http://www.mpibpc.mpg.de/grubmueller/esztermann signature.asc Description: Digital signature ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Minimum number of slots
On Thu, Feb 01, 2018 at 11:44:25AM +0100, Ansgar Esztermann-Kirchner wrote: > Now, I think I can improve upon this choice by creating separate > queues for different machines "sizes", i.e. an 8-core queue, a > 20-core queue and so on. However, I do not see a (tractable) way to > enforce proper job-queue association: allocation_rule 8 (etc) comes to > mind, but I would lose the crucial one-host limit. This could be > circumvented by creating one PE per node, but that would mean a huge > administrative burden (and possible also a lot of extra load on the > scheduler). If I undertsand you correctly: Create a $pe_slots PE for each type of node and associate it with the appropriate nodes. Have a jsv tweak the requested pe based on the number of slots requested. William signature.asc Description: PGP signature ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Minimum number of slots
> #$ -l nodes=2,ppn=12 (--> here: 12 slots on 2 nodes = 24 in sum) . > > when you restrict "nodes=1" (with resource quotas "qconf -mrqs" for example) > then nobody > should be able to use more than 1 node. sorry, correction: "to use more than 1 node PER JOB" of course. ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Minimum number of slots
Hi Ansger, do you have experiences with Torque/PBS? There is a useful gridengine plugin which works quite similar: https://github.com/brlindblom/gepetools I have it in use on one of my clusters. The advantage: users must order not just slots but also nodes in the form: #$ -l nodes=2,ppn=12 (--> here: 12 slots on 2 nodes = 24 in sum) . when you restrict "nodes=1" (with resource quotas "qconf -mrqs" for example) then nobody should be able to use more than 1 node. If you are interested and don't want to read all the gepetool instructions, I can send you my personal documentation. Ursula Von: users-boun...@gridengine.orgim Auftrag von Ansgar Esztermann-Kirchner Gesendet: Donnerstag, 01. Februar 2018 11:44 An: users@gridengine.org Betreff: [gridengine users] Minimum number of slots Hello List, we're on 2011.11, and our general setup has nodes with a mixture of CPUs (e.g. 8, 20, 40 cores). Most of the nodes lack a high-speed interconnect, so we use a PE with allocation_rule $pe_slots, limiting jobs to just a single machine. We're also using fairshare to achieve an even distribution of resources to users in the long term. There is a trade-off between fairshare and optimal resource usage when only low-priority users have 40-core jobs and a 40-core node becomes free. I know I can set my preferences by setting the relative weights for fairshare and urgency. Now, I think I can improve upon this choice by creating separate queues for different machines "sizes", i.e. an 8-core queue, a 20-core queue and so on. However, I do not see a (tractable) way to enforce proper job-queue association: allocation_rule 8 (etc) comes to mind, but I would lose the crucial one-host limit. This could be circumvented by creating one PE per node, but that would mean a huge administrative burden (and possible also a lot of extra load on the scheduler). Anything I'm missing? Thanks a lot, A. -- Ansgar Esztermann Sysadmin http://www.mpibpc.mpg.de/grubmueller/esztermann ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users