On 06/22/2017 01:34 PM, Ole Holm Nielsen wrote:

I'm announcing an updated version 0.50 of the node status tool "pestat" for Slurm. I discovered how to obtain the node Free Memory with sinfo, so now we can do nice things with memory usage!

Hi! thank you for the great tool! i don't know if this is intended but :

[Monday 26.06.17 18:12] adrian@sev : ~  $
sinfo -N -t idle -o "%N %P %C %O %m %e %t" | column -t
NODELIST   PARTITION  CPUS(A/I/O/T)  CPU_LOAD  MEMORY  FREE_MEM  STATE
localhost  local*     0/8/0/8        0.03      14984   201       idle

[Monday 26.06.17 18:13] adrian@sev : ~  $
free -m
total used free shared buff/cache available Mem: 14984 392 182 134 14409 14081
Swap:          8191           0        8191

[Monday 26.06.17 18:13] adrian@sev : ~  $
pestat
Hostname Partition Node Num_CPU CPUload Memsize Freemem Joblist State Use/Tot (MB) (MB) JobId User ...
localhost          local*     idle   0   8    0.03     14984      201*


while it is clear that the reported free mem is what is reported by free as "free" one might argue that buffers/cache is memory available for usage as it will shrink with the application usage ...

Maybe the FREE_MEM should be reported as (free + cached) ?

Thank you!!
Adrian



New features:

1. The "pestat -f" will flag nodes with less than 20% free memory.

2. Now "pestat -m 1000" will print nodes with less than 1000 MB free memory.

3. Use "pestat -M 200000" to print nodes with greater than 200000 MB free memory. Jobs on such under-utilized nodes might better be submitted to lower-memory nodes.

Download the tool (a short bash script) from https://ftp.fysik.dtu.dk/Slurm/pestat. If your commands do not live in /usr/bin, please make appropriate changes in the CONFIGURE section at the top of the script.

Usage: pestat [-p partition(s)] [-u username] [-q qoslist] [-s statelist]
     [-f | -m free_mem | -M free_mem ] [-V] [-h]
where:
     -p partition: Select only partion <partition>
     -u username: Print only user <username>
     -q qoslist: Print only QOS in the qoslist <qoslist>
     -s statelist: Print only nodes with state in <statelist>
     -f: Print only nodes that are flagged by * (unexpected load etc.)
     -m free_mem: Print only nodes with free memory LESS than free_mem MB
-M free_mem: Print only nodes with free memory GREATER than free_mem MB (under-utilized)
     -h: Print this help information
     -V: Version information


I use "pestat -f" all the time because it prints and flags (in color) only the nodes which have an unexpected CPU load or node status, for example:

# pestat  -f
Print only nodes that are flagged by *
Hostname Partition Node Num_CPU CPUload Memsize Freemem Joblist State Use/Tot (MB) (MB) JobId User ... a066 xeon8* alloc 8 8 8.04 23900 173* 91683 user01 a067 xeon8* alloc 8 8 8.07 23900 181* 91683 user01 a083 xeon8* alloc 8 8 8.06 23900 172* 91683 user01


The -s option is useful for checking on possibly unusual node states, for example:

# pestat -s mixed



--
----------------------------------------------
Adrian Sevcenco, Ph.D.                       |
Institute of Space Science - ISS, Romania    |
adrian.sevcenco at {cern.ch,spacescience.ro} |
----------------------------------------------

Reply via email to