Hi Ole,
It's possible that it was a temporary glitch, because all look OK to me now.
Hostname Partition Node Num_CPU CPUload Memsize Freemem Joblist
State Use/Tot (MB) (MB) JobId
User ...
devel-pcomp1 vtest* idle 0 12 0.06 129080 124674
devel-vcomp1 vtest* idle 0 2 0.00 5845 4371
...
I don't really know what caused the zero values before, but yet again I was
playing with several components at a time, including HA.
Thank you!
-Mehmet
________________________________
From: Ole Holm Nielsen <[email protected]>
Sent: Monday, June 26, 2017 6:06:46 AM
To: slurm-dev
Subject: [slurm-dev] Re: Announce: Node status tool "pestat" for Slurm updated
to version 0.50
On 23-06-2017 17:20, Belgin, Mehmet wrote:
> One thing I noticed is that pestat reports zero Freemem until a job is
> allocated on nodes. I’d expect it to report the same value as Memsize if
> no jobs are running. I wanted to offer this as a suggestion since zero
> free memory on idle nodes may be a bit confusing for users.
...
> Before Job allocation
> # pestat -p vtest
> Print only nodes in partition vtest
> Hostname Partition Node Num_CPU CPUload Memsize Freemem
> Joblist
> State Use/Tot (MB) (MB)
> JobId User ...
> devel-pcomp1 vtest* idle 0 12 0.02 129080 *0*
> devel-vcomp1 vtest* idle 0 2 0.02 5845 *0*
> devel-vcomp2 vtest* idle 0 2 0.00 5845 *0*
> devel-vcomp3 vtest* idle 0 2 0.03 5845 *0*
> devel-vcomp4 vtest* idle 0 2 0.01 5845 *0*
I'm not seeing the incorrect Freemem that you report. I get sensible
numbers for Freemem:
# pestat -s idle
Select only nodes with state=idle
Hostname Partition Node Num_CPU CPUload Memsize Freemem Joblist
State Use/Tot (MB) (MB)
JobId User ...
a017 xeon8* idle 0 8 4.25* 23900 21590
a077 xeon8* idle 0 8 3.47* 23900 22964
b003 xeon8* idle 0 8 8.01* 23900 16839
b046 xeon8* idle 0 8 0.01 23900 22393
b066 xeon8* idle 0 8 2.84* 23900 18610
b081 xeon8* idle 0 8 0.01 23900 21351
g021 xeon16 idle 0 16 0.01 64000 52393
g022 xeon16 idle 0 16 0.01 64000 60717
g039 xeon16 idle 0 16 0.01 64000 61795
g048 xeon16 idle 0 16 0.01 64000 62338
g074 xeon16 idle 0 16 0.01 64000 62274
g076 xeon16 idle 0 16 0.01 64000 58854
You should use sinfo directly to verify Slurm's data:
sinfo -N -t idle -o "%N %P %C %O %m %e %t"
FYI: We run Slurm 16.05 and have configured Cgroups.
/Ole