Re: [slurm-users] [ext] Re: bufferoverflow in slurmd with acct_gather_energy plugin

2023-08-30 Thread Ole Holm Nielsen

Hi Magnus,

On 8/30/23 11:17, Hagdorn, Magnus Karl Moritz wrote:

On Wed, 2023-08-30 at 10:38 +0200, Ole Holm Nielsen wrote:

This is a very useful example!  I guess that you have also defined
EnergyIPMIUsername and EnergyIPMIPassword in acct_gather.conf?  How
is the
EnergyIPMIPassword protected from normal users if the
/etc/slurm/acct_gather.conf file exists?


it talks to the BMC via to OS, so no password/user required.


Ah, of course, the slurmd on your nodes can do local IPMI commands :-)


An EnergyIPMIFrequency of 10 seconds sounds like it could put a high
load
on the BMC and the server?


that might be my problem - I haven't checked that.


Maybe this could be a problem.  It's anyway better not to have "OS jitter" 
in HPC compute nodes by having system tasks executing too frequently.



I have never tested IPMI DCMI_ENHANCED commands.  Do you have some
FreeIMPI commands which can be used to verify the basic IPMI
DCMI_ENHANCED
functionality?


I checked the spec sheet of our BMC which suggested that it should be
able to do DCMI_ENHANCED


That's good to know.  Our servers from Huawei don't seem to support 
DCMI_ENHANCED.


The following ipmitool command works locally on a node, but I can't figure 
out the corresponding command to use with FreeIPMI.


# ipmitool dcmi power reading

Instantaneous power reading:   689 Watts
Minimum during sampling period: 19 Watts
Maximum during sampling period:905 Watts
Average power reading over sample period:  682 Watts
IPMI timestamp:   Wed Aug 30 09:35:28 2023
Sampling period:  0001 Seconds.
Power reading state is:   activated


Best regards,
Ole



Re: [slurm-users] [ext] Re: bufferoverflow in slurmd with acct_gather_energy plugin

2023-08-30 Thread Hagdorn, Magnus Karl Moritz
On Wed, 2023-08-30 at 10:38 +0200, Ole Holm Nielsen wrote:
> This is a very useful example!  I guess that you have also defined 
> EnergyIPMIUsername and EnergyIPMIPassword in acct_gather.conf?  How
> is the 
> EnergyIPMIPassword protected from normal users if the 
> /etc/slurm/acct_gather.conf file exists?
> 
it talks to the BMC via to OS, so no password/user required.


> An EnergyIPMIFrequency of 10 seconds sounds like it could put a high
> load 
> on the BMC and the server?
> 
that might be my problem - I haven't checked that.


> I have never tested IPMI DCMI_ENHANCED commands.  Do you have some 
> FreeIMPI commands which can be used to verify the basic IPMI
> DCMI_ENHANCED 
> functionality?
> 
I checked the spec sheet of our BMC which suggested that it should be
able to do DCMI_ENHANCED

Regards
magnus


-- 
Magnus Hagdorn
Charité – Universitätsmedizin Berlin
Geschäftsbereich IT | Scientific Computing
 
Campus Charité Virchow Klinikum
Forum 4 | Ebene 02 | Raum 2.020
Augustenburger Platz 1
13353 Berlin
 
magnus.hagd...@charite.de
https://www.charite.de
HPC Helpdesk: sc-hpc-helpd...@charite.de


smime.p7s
Description: S/MIME cryptographic signature


Re: [slurm-users] [ext] Re: bufferoverflow in slurmd with acct_gather_energy plugin

2023-08-30 Thread Ole Holm Nielsen

Hi Magnus,

On 8/30/23 10:12, Hagdorn, Magnus Karl Moritz wrote:

Yes, but can you share the details of which parameters you configure
in
this plugin so that you can extract node power?  This doesn't seem
obvious to me.


not much needs configuring. We have

EnergyIPMIFrequency=10
EnergyIPMICalcAdjustment=yes
EnergyIPMIPowerSensors=Node=DCMI_ENHANCED

in the acct_gather.conf.


This is a very useful example!  I guess that you have also defined 
EnergyIPMIUsername and EnergyIPMIPassword in acct_gather.conf?  How is the 
EnergyIPMIPassword protected from normal users if the 
/etc/slurm/acct_gather.conf file exists?


An EnergyIPMIFrequency of 10 seconds sounds like it could put a high load 
on the BMC and the server?


I have never tested IPMI DCMI_ENHANCED commands.  Do you have some 
FreeIMPI commands which can be used to verify the basic IPMI DCMI_ENHANCED 
functionality?


Thanks,
Ole



Re: [slurm-users] [ext] Re: bufferoverflow in slurmd with acct_gather_energy plugin

2023-08-30 Thread Hagdorn, Magnus Karl Moritz
Hi Ole,

On Tue, 2023-08-29 at 20:56 +0200, Ole Holm Nielsen wrote:
> Yes, but can you share the details of which parameters you configure
> in 
> this plugin so that you can extract node power?  This doesn't seem 
> obvious to me.

not much needs configuring. We have

EnergyIPMIFrequency=10
EnergyIPMICalcAdjustment=yes
EnergyIPMIPowerSensors=Node=DCMI_ENHANCED

in the acct_gather.conf. 
Regards
magnus


-- 
Magnus Hagdorn
Charité – Universitätsmedizin Berlin
Geschäftsbereich IT | Scientific Computing
 
Campus Charité Virchow Klinikum
Forum 4 | Ebene 02 | Raum 2.020
Augustenburger Platz 1
13353 Berlin
 
magnus.hagd...@charite.de
https://www.charite.de
HPC Helpdesk: sc-hpc-helpd...@charite.de


smime.p7s
Description: S/MIME cryptographic signature


Re: [slurm-users] [ext] Re: bufferoverflow in slurmd with acct_gather_energy plugin

2023-08-29 Thread Ole Holm Nielsen

Hi Magnus,

On 29-08-2023 13:56, Hagdorn, Magnus Karl Moritz wrote:

I'm curious to learn about your energy gathering method:  How do you
extract node power using IPMI using FreeIMPI (or some other toolset),
and
how do you configure Slurm for this?



We are using the SLURM plugin which is enabled using
AcctGatherEnergyType=acct_gather_energy/ipmi

https://slurm.schedmd.com/acct_gather.conf.html#SECTION_acct_gather_energy/IPMI


Yes, but can you share the details of which parameters you configure in 
this plugin so that you can extract node power?  This doesn't seem 
obvious to me.


Thanks,
Ole



Re: [slurm-users] [ext] Re: bufferoverflow in slurmd with acct_gather_energy plugin

2023-08-29 Thread Hagdorn, Magnus Karl Moritz
Hi Ole,

On Tue, 2023-08-29 at 11:08 +0200, Ole Holm Nielsen wrote:
> 
> I'm curious to learn about your energy gathering method:  How do you 
> extract node power using IPMI using FreeIMPI (or some other toolset),
> and 
> how do you configure Slurm for this?
> 

We are using the SLURM plugin which is enabled using
AcctGatherEnergyType=acct_gather_energy/ipmi

https://slurm.schedmd.com/acct_gather.conf.html#SECTION_acct_gather_energy/IPMI


Cheers
magnus



-- 
Magnus Hagdorn
Charité – Universitätsmedizin Berlin
Geschäftsbereich IT | Scientific Computing
 
Campus Charité Virchow Klinikum
Forum 4 | Ebene 02 | Raum 2.020
Augustenburger Platz 1
13353 Berlin
 
magnus.hagd...@charite.de
https://www.charite.de
HPC Helpdesk: sc-hpc-helpd...@charite.de


smime.p7s
Description: S/MIME cryptographic signature