Hi Ole,

It's possible that it was a temporary glitch, because all look OK to me now.

Hostname       Partition     Node Num_CPU  CPUload  Memsize  Freemem  Joblist
                            State Use/Tot              (MB)     (MB)  JobId 
User ...
devel-pcomp1          vtest*     idle   0  12    0.06    129080   124674
devel-vcomp1          vtest*     idle   0   2    0.00      5845     4371
...

I don't really know what caused the zero values before, but yet again I was 
playing with several components at a time, including HA.

Thank you!
-Mehmet


________________________________
From: Ole Holm Nielsen <[email protected]>
Sent: Monday, June 26, 2017 6:06:46 AM
To: slurm-dev
Subject: [slurm-dev] Re: Announce: Node status tool "pestat" for Slurm updated 
to version 0.50


On 23-06-2017 17:20, Belgin, Mehmet wrote:
> One thing I noticed is that pestat reports zero Freemem until a job is
> allocated on nodes. I’d expect it to report the same value as Memsize if
> no jobs are running. I wanted to offer this as a suggestion since zero
> free memory on idle nodes may be a bit confusing for users.
...
> Before Job allocation
> # pestat -p vtest
> Print only nodes in partition vtest
> Hostname       Partition     Node Num_CPU  CPUload  Memsize  Freemem
>   Joblist
>                              State Use/Tot              (MB)     (MB)
>   JobId User ...
> devel-pcomp1          vtest*     idle   0  12    0.02    129080 *0*
> devel-vcomp1          vtest*     idle   0   2    0.02      5845 *0*
> devel-vcomp2          vtest*     idle   0   2    0.00      5845 *0*
> devel-vcomp3          vtest*     idle   0   2    0.03      5845 *0*
> devel-vcomp4          vtest*     idle   0   2    0.01      5845 *0*

I'm not seeing the incorrect Freemem that you report.  I get sensible
numbers for Freemem:

# pestat -s idle
Select only nodes with state=idle
Hostname       Partition     Node Num_CPU  CPUload  Memsize  Freemem Joblist
                             State Use/Tot              (MB)     (MB)
JobId User ...
     a017          xeon8*     idle   0   8    4.25*    23900    21590
     a077          xeon8*     idle   0   8    3.47*    23900    22964
     b003          xeon8*     idle   0   8    8.01*    23900    16839
     b046          xeon8*     idle   0   8    0.01     23900    22393
     b066          xeon8*     idle   0   8    2.84*    23900    18610
     b081          xeon8*     idle   0   8    0.01     23900    21351
     g021          xeon16     idle   0  16    0.01     64000    52393
     g022          xeon16     idle   0  16    0.01     64000    60717
     g039          xeon16     idle   0  16    0.01     64000    61795
     g048          xeon16     idle   0  16    0.01     64000    62338
     g074          xeon16     idle   0  16    0.01     64000    62274
     g076          xeon16     idle   0  16    0.01     64000    58854

You should use sinfo directly to verify Slurm's data:

  sinfo -N -t idle -o "%N %P %C %O %m %e %t"

FYI: We run Slurm 16.05 and have configured Cgroups.

/Ole

Reply via email to