We have implemented a memory reservation on our GE cluster (under SGE and migrated to UGE). The mem reservation is telling the scheduler what to queue, using a consumable complex, it itself does not impose limits on jobs at runtime, these are set via h_data and h_vmem, and rely on compliant (read educated) users.
If I remember right, it is just a 2 steps exercise: 1- add a consumable to the complex (qconf -sc/Mc) ie: mem_res mres MEMORY <= YES JOB 1G 0 YES 0.000000 last 2 cols are UGE's aapre affinity fields, remove these for SGE. By default a job always reserve 1G and mem res is a per job qty on our cluster, does need to be (we had it per slots initially). 2- set for each compute host the amount of reservable memory via the complex_values field I have two (C-shell) scripts to set is on all or on one host - see below sig, although I also have local HDD & SSD disk space reservation mechanism (and associated complexes you need to edit out) and no of GPUs on compute hosts (you also need to remove, see stricken out text) Users submit jobs with something like "-l mres=10G,h_data=10G,h_vmem=10G", and mres=XX needs to be scaled up by the no of req'd slots for multi-threaded/multi-slots jobs, and does not apply to MPI jobs (b/c consumable=JOB, but you can instead use consumable=YES and the mres=XX value is per slots and OK for MPI jobs). A JVS could check that users do it right, we don't as we educate users & monitor their jobs (but keep thinking of writing such JVS, where h_data and h_vmem is derived from mres and no of req PEs...). I also have tools to view the reserved/used ratio and email users when that ratio is too large (aka wasted memory). We have high-memory and very-high memory queues and the memory reservation vs usage is monitored (logged every 5m for running jobs so it can be plotted), etc... it works great since we have to accomodate biogenomics with large/very large mem reqs (up to 2TB). Cheers, Sylvain -- Here are the csh scripts, for UGE (not SGE, so you may need to drop -ncb to qhost) and it set local hdd/ssd disk and gpu complexes (that you probably want to remove what is stricken out). Also all our compute nodes are all named compute-XX-YY (hence the egrep ^compute). For avail mem you could use a static list (like I do for disk and gpu) instead of what qhost lists. The '^license' is also UGE specific. cat set-host-complex.csh #!/bin/csh # # Set the host-level limits for the resources # slots= and mem_res= lduse= ssdres= ngpu= # # HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS qhost -ncb | egrep ^compute > /tmp/qhost.$$ set hosts = (`awk '{print $1}' /tmp/qhost.$$`) set ncpus = (`awk '{print $3}' /tmp/qhost.$$`) set nmems = (`awk '{print $5}' /tmp/qhost.$$`) rm /tmp/qhost.$$ @ i = 0 set c = /tmp/complex.$$ while ($i < $#hosts) @ i++ set h = $hosts[$i] set n = $ncpus[$i] set m = $nmems[$i] set ld = `egrep "^$h " local_disk.list | awk '{print $2}'` set ssd = `egrep "^$h " ssd_disk.list | awk '{print $2}'` set ngpu = `egrep "^$h " ngpu.list | awk '{print $2}'` if ("x$ld" == 'x') set ld = 0.0 if ("x$ssd" == 'x') set ssd = 0.0 if ("x$ngpu" == 'x') then set GPU else set GPU = ",num_gpu=$ngpu" endif set x = "slots=$n,mem_res=$m,lduse=$ld,ssdres=$ssd$GPU" echo $h "$x" qconf -se $h | \ egrep '^hostname|^load_scaling|^user_lists|^xuser_lists|^projects|^xprojects|^usage_scaling |^license|^report_variables' > $c echo "complex_values $x" >> $c ## cat $c qconf -Me $c rm $c end cat set-1-host-complex.csh #!/bin/csh # # Set the host-level limits for the resources # slots= mem_res= lduse= ssdres= ngpu= # for host in $1 # # HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS set host = $1 echo $host | grep -q ^hydra if ($status) set host = compute-$1 echo host=$host # qhost -ncb | egrep ^$host\ > /tmp/qhost.$$ set hosts = (`awk '{print $1}' /tmp/qhost.$$`) set ncpus = (`awk '{print $3}' /tmp/qhost.$$`) set nmems = (`awk '{print $5}' /tmp/qhost.$$`) rm /tmp/qhost.$$ @ i = 0 set c = /tmp/complex.$$ while ($i < $#hosts) @ i++ set h = $hosts[$i] set n = $ncpus[$i] set m = $nmems[$i] set ld = `egrep "^$h " local_disk.list | awk '{print $2}'` set ssd = `egrep "^$h " ssd_disk.list | awk '{print $2}'` set ngpu = `egrep "^$h " ngpu.list | awk '{print $2}'` if ("x$ld" == 'x') set ld = 0.0 if ("x$ssd" == 'x') set ssd = 0.0 if ("x$ngpu" == 'x') then set GPU else set GPU = ",num_gpu=$ngpu" endif set x = "slots=$n,mem_res=$m,lduse=$ld,ssdres=$ssd$GPU" echo $h "$x" qconf -se $h | \ egrep '^hostname|^load_scaling|^user_lists|^xuser_lists|^projects|^xprojects|^usage_scaling |^license|^report_variables' > $c echo "complex_values $x" >> $c ## cat $c qconf -Me $c rm $c echo $h done end On Sat, Feb 8, 2020 at 7:00 AM <users-requ...@gridengine.org> wrote: > Send users mailing list submissions to > users@gridengine.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://gridengine.org/mailman/listinfo/users > or, via email, send a message with subject or body 'help' to > users-requ...@gridengine.org > > You can reach the person managing the list at > users-ow...@gridengine.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of users digest..." > > > Today's Topics: > > 1. Memory Reservation (Quinones, Jose M.) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 7 Feb 2020 14:12:18 +0000 > From: "Quinones, Jose M." <jmq2...@cumc.columbia.edu> > To: "users@gridengine.org" <users@gridengine.org> > Subject: [gridengine users] Memory Reservation > Message-ID: > < > bl0pr02mb3859509883cb81f199deb266f9...@bl0pr02mb3859.namprd02.prod.outlook.com > > > > Content-Type: text/plain; charset="iso-8859-1" > > Hello, > > Is there a way to reserve/evacuate a node based on a memory reservation? > Using "-R" doesn't seem to work for this.. > > Thanks, > Jose > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://gridengine.org/pipermail/users/attachments/20200207/09e2d4ca/attachment-0001.html > > > > ------------------------------ > > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users > > > End of users Digest, Vol 110, Issue 5 > ************************************* >
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users