I'm announcing an updated version 0.51 of the node status tool "pestat"
for Slurm.
New features:
1. Turning on colors explicitly even when the output doesn't go to a
terminal with the -C flag (and -c to turn off colors). Thanks to Fermin
Molina <[email protected]> for requesting this! Fermin's suggests to
allow a nice continuous monitoring of "flagged" nodes with:
# watch -n 60 --color 'pestat -f -C'
2. Added -n/-w hostlist to select a subset of nodes. The -n form is for
compatibility with sinfo, whereas -w is compatible with pdsh and clush
(ClusterShell).
Download the tool (a short bash script) from
https://ftp.fysik.dtu.dk/Slurm/pestat. If your commands do not live in
/usr/bin, please make appropriate changes in the CONFIGURE section at
the top of the script.
Usage: pestat [-p partition(s)] [-u username] [-q qoslist] [-s
statelist] [-n/-w hostlist]
[-f | -m free_mem | -M free_mem ] [-C/-c] [-V] [-h]
where:
-p partition: Select only partion <partition>
-u username: Print only user <username>
-q qoslist: Print only QOS in the qoslist <qoslist>
-s statelist: Print only nodes with state in <statelist>
-n/-w hostlist: Print only nodes in hostlist
-f: Print only nodes that are flagged by * (unexpected load etc.)
-m free_mem: Print only nodes with free memory LESS than free_mem MB
-M free_mem: Print only nodes with free memory GREATER than free_mem MB
(under-utilized)
-C: Color output is forced ON
-c: Color output is forced OFF
-h: Print this help information
Usage: pestat [-p partition(s)] [-u username] [-q qoslist] [-s statelist]
[-f | -m free_mem | -M free_mem ] [-V] [-h]
where:
-p partition: Select only partion <partition>
-u username: Print only user <username>
-q qoslist: Print only QOS in the qoslist <qoslist>
-s statelist: Print only nodes with state in <statelist>
-f: Print only nodes that are flagged by * (unexpected load etc.)
-m free_mem: Print only nodes with free memory LESS than free_mem MB
-M free_mem: Print only nodes with free memory GREATER than free_mem MB
(under-utilized)
-h: Print this help information
-V: Version information
-V: Version information
I use "pestat -f" all the time because it prints and flags (in color)
only the nodes which have an unexpected CPU load or node status, for
example:
# pestat -f
Print only nodes that are flagged by *
Hostname Partition Node Num_CPU CPUload Memsize Freemem Joblist
State Use/Tot (MB) (MB)
JobId User ...
a066 xeon8* alloc 8 8 8.04 23900 173*
91683 user01
a067 xeon8* alloc 8 8 8.07 23900 181*
91683 user01
a083 xeon8* alloc 8 8 8.06 23900 172*
91683 user01
The -s option is useful for checking on possibly unusual node states,
for example:
# pestat -s mixed
--
Ole Holm Nielsen
Department of Physics, Technical University of Denmark