We have implemented a memory reservation on our GE cluster (under SGE and
migrated to UGE).
The mem reservation is telling the scheduler what to queue, using a
consumable complex, it itself does not impose limits on jobs at runtime,
these are set via h_data and h_vmem, and rely on compliant (read educated)
users.

If I remember right, it is just a 2 steps exercise:

1- add a consumable to the complex (qconf -sc/Mc) ie:

mem_res             mres       MEMORY      <=    YES         JOB        1G
      0       YES    0.000000

last 2 cols are UGE's aapre  affinity fields, remove these for SGE. By
default a job always reserve 1G and mem res is a per job qty on our
cluster, does
need to be (we had it per slots initially).

2- set for each compute host the amount of reservable memory via the
complex_values field

I have two (C-shell) scripts to set is on all or on one host - see below
sig, although I also have local HDD & SSD disk space reservation mechanism
(and associated complexes you need to edit out) and no of GPUs on compute
hosts (you also need to remove, see stricken out text)

Users submit jobs with something like "-l mres=10G,h_data=10G,h_vmem=10G",
and mres=XX needs to be scaled up by the no of req'd slots for
multi-threaded/multi-slots jobs, and does not apply to MPI jobs
(b/c consumable=JOB, but you can instead use consumable=YES and the mres=XX
value is per slots and OK for MPI jobs).

A JVS could check that users do it right, we don't as we educate users &
monitor their jobs (but keep thinking of writing such JVS, where h_data and
h_vmem is derived from mres and no of req PEs...).

I also have tools to view the reserved/used ratio and email users when that
ratio is too large (aka wasted memory). We have high-memory and very-high
memory queues and the memory reservation vs usage is monitored (logged
every 5m for running jobs so it can be plotted), etc... it works great
since we have to accomodate biogenomics with large/very large mem reqs (up
to 2TB).

  Cheers,
    Sylvain
--
Here are the csh scripts, for UGE (not SGE, so you may need to drop -ncb to
qhost) and it set local hdd/ssd disk and gpu complexes (that you probably
want to remove what is stricken out). Also all our compute nodes are all
named compute-XX-YY (hence the egrep ^compute). For avail mem you could use
a static list (like I do for disk and gpu) instead of what qhost lists.
The '^license' is also UGE specific.

cat set-host-complex.csh
#!/bin/csh
#
# Set the host-level limits for the resources
#   slots= and mem_res= lduse= ssdres= ngpu=
#
# HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO
 SWAPUS
qhost -ncb | egrep ^compute > /tmp/qhost.$$
set hosts = (`awk '{print $1}' /tmp/qhost.$$`)
set ncpus = (`awk '{print $3}' /tmp/qhost.$$`)
set nmems = (`awk '{print $5}' /tmp/qhost.$$`)
rm /tmp/qhost.$$
@ i = 0
set c  = /tmp/complex.$$
while ($i < $#hosts)
  @ i++
  set h = $hosts[$i]
  set n = $ncpus[$i]
  set m = $nmems[$i]
  set ld   = `egrep "^$h " local_disk.list | awk '{print $2}'`
  set ssd  = `egrep "^$h "   ssd_disk.list | awk '{print $2}'`
  set ngpu = `egrep "^$h "       ngpu.list | awk '{print $2}'`
  if ("x$ld"   == 'x') set ld   = 0.0
  if ("x$ssd"  == 'x') set ssd  = 0.0
  if ("x$ngpu" == 'x') then
     set GPU
  else
     set GPU = ",num_gpu=$ngpu"
  endif
  set x = "slots=$n,mem_res=$m,lduse=$ld,ssdres=$ssd$GPU"
  echo $h "$x"
  qconf -se $h | \
  egrep
'^hostname|^load_scaling|^user_lists|^xuser_lists|^projects|^xprojects|^usage_scaling
|^license|^report_variables' > $c
  echo "complex_values        $x" >> $c
  ## cat $c
  qconf -Me $c
  rm $c
end

cat set-1-host-complex.csh
#!/bin/csh
#
# Set the host-level limits for the resources
#   slots= mem_res= lduse= ssdres= ngpu=
# for host in $1
#
# HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO
 SWAPUS
set host = $1
echo $host | grep -q ^hydra
if ($status)  set host = compute-$1
echo host=$host
#
qhost -ncb | egrep ^$host\  > /tmp/qhost.$$
set hosts = (`awk '{print $1}' /tmp/qhost.$$`)
set ncpus = (`awk '{print $3}' /tmp/qhost.$$`)
set nmems = (`awk '{print $5}' /tmp/qhost.$$`)
rm /tmp/qhost.$$
@ i = 0
set c  = /tmp/complex.$$
while ($i < $#hosts)
  @ i++
  set h = $hosts[$i]
  set n = $ncpus[$i]
  set m = $nmems[$i]
  set ld   = `egrep "^$h " local_disk.list | awk '{print $2}'`
  set ssd  = `egrep "^$h "   ssd_disk.list | awk '{print $2}'`
  set ngpu = `egrep "^$h "       ngpu.list | awk '{print $2}'`
  if ("x$ld"   == 'x') set ld   = 0.0
  if ("x$ssd"  == 'x') set ssd  = 0.0
  if ("x$ngpu" == 'x') then
     set GPU
  else
     set GPU = ",num_gpu=$ngpu"
  endif
  set x = "slots=$n,mem_res=$m,lduse=$ld,ssdres=$ssd$GPU"
  echo $h "$x"
  qconf -se $h | \
  egrep
'^hostname|^load_scaling|^user_lists|^xuser_lists|^projects|^xprojects|^usage_scaling
|^license|^report_variables' > $c
  echo "complex_values        $x" >> $c
  ## cat $c
  qconf -Me $c
  rm $c
  echo $h done
end


On Sat, Feb 8, 2020 at 7:00 AM <users-requ...@gridengine.org> wrote:

> Send users mailing list submissions to
>         users@gridengine.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://gridengine.org/mailman/listinfo/users
> or, via email, send a message with subject or body 'help' to
>         users-requ...@gridengine.org
>
> You can reach the person managing the list at
>         users-ow...@gridengine.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of users digest..."
>
>
> Today's Topics:
>
>    1. Memory Reservation (Quinones, Jose M.)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 7 Feb 2020 14:12:18 +0000
> From: "Quinones, Jose M." <jmq2...@cumc.columbia.edu>
> To: "users@gridengine.org" <users@gridengine.org>
> Subject: [gridengine users] Memory Reservation
> Message-ID:
>         <
> bl0pr02mb3859509883cb81f199deb266f9...@bl0pr02mb3859.namprd02.prod.outlook.com
> >
>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hello,
>
>  Is there a way to reserve/evacuate a node based on a memory reservation?
> Using "-R" doesn't seem to work for this..
>
> Thanks,
> Jose
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://gridengine.org/pipermail/users/attachments/20200207/09e2d4ca/attachment-0001.html
> >
>
> ------------------------------
>
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users
>
>
> End of users Digest, Vol 110, Issue 5
> *************************************
>
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to