Re: [slurm-users] [ext] Re: bufferoverflow in slurmd with acct_gather_energy plugin
Hi Magnus, On 8/30/23 11:17, Hagdorn, Magnus Karl Moritz wrote: On Wed, 2023-08-30 at 10:38 +0200, Ole Holm Nielsen wrote: This is a very useful example! I guess that you have also defined EnergyIPMIUsername and EnergyIPMIPassword in acct_gather.conf? How is the EnergyIPMIPassword protected from normal users if the /etc/slurm/acct_gather.conf file exists? it talks to the BMC via to OS, so no password/user required. Ah, of course, the slurmd on your nodes can do local IPMI commands :-) An EnergyIPMIFrequency of 10 seconds sounds like it could put a high load on the BMC and the server? that might be my problem - I haven't checked that. Maybe this could be a problem. It's anyway better not to have "OS jitter" in HPC compute nodes by having system tasks executing too frequently. I have never tested IPMI DCMI_ENHANCED commands. Do you have some FreeIMPI commands which can be used to verify the basic IPMI DCMI_ENHANCED functionality? I checked the spec sheet of our BMC which suggested that it should be able to do DCMI_ENHANCED That's good to know. Our servers from Huawei don't seem to support DCMI_ENHANCED. The following ipmitool command works locally on a node, but I can't figure out the corresponding command to use with FreeIPMI. # ipmitool dcmi power reading Instantaneous power reading: 689 Watts Minimum during sampling period: 19 Watts Maximum during sampling period:905 Watts Average power reading over sample period: 682 Watts IPMI timestamp: Wed Aug 30 09:35:28 2023 Sampling period: 0001 Seconds. Power reading state is: activated Best regards, Ole
Re: [slurm-users] [ext] Re: bufferoverflow in slurmd with acct_gather_energy plugin
On Wed, 2023-08-30 at 10:38 +0200, Ole Holm Nielsen wrote: > This is a very useful example! I guess that you have also defined > EnergyIPMIUsername and EnergyIPMIPassword in acct_gather.conf? How > is the > EnergyIPMIPassword protected from normal users if the > /etc/slurm/acct_gather.conf file exists? > it talks to the BMC via to OS, so no password/user required. > An EnergyIPMIFrequency of 10 seconds sounds like it could put a high > load > on the BMC and the server? > that might be my problem - I haven't checked that. > I have never tested IPMI DCMI_ENHANCED commands. Do you have some > FreeIMPI commands which can be used to verify the basic IPMI > DCMI_ENHANCED > functionality? > I checked the spec sheet of our BMC which suggested that it should be able to do DCMI_ENHANCED Regards magnus -- Magnus Hagdorn Charité – Universitätsmedizin Berlin Geschäftsbereich IT | Scientific Computing Campus Charité Virchow Klinikum Forum 4 | Ebene 02 | Raum 2.020 Augustenburger Platz 1 13353 Berlin magnus.hagd...@charite.de https://www.charite.de HPC Helpdesk: sc-hpc-helpd...@charite.de smime.p7s Description: S/MIME cryptographic signature
Re: [slurm-users] [ext] Re: bufferoverflow in slurmd with acct_gather_energy plugin
Hi Magnus, On 8/30/23 10:12, Hagdorn, Magnus Karl Moritz wrote: Yes, but can you share the details of which parameters you configure in this plugin so that you can extract node power? This doesn't seem obvious to me. not much needs configuring. We have EnergyIPMIFrequency=10 EnergyIPMICalcAdjustment=yes EnergyIPMIPowerSensors=Node=DCMI_ENHANCED in the acct_gather.conf. This is a very useful example! I guess that you have also defined EnergyIPMIUsername and EnergyIPMIPassword in acct_gather.conf? How is the EnergyIPMIPassword protected from normal users if the /etc/slurm/acct_gather.conf file exists? An EnergyIPMIFrequency of 10 seconds sounds like it could put a high load on the BMC and the server? I have never tested IPMI DCMI_ENHANCED commands. Do you have some FreeIMPI commands which can be used to verify the basic IPMI DCMI_ENHANCED functionality? Thanks, Ole
Re: [slurm-users] [ext] Re: bufferoverflow in slurmd with acct_gather_energy plugin
Hi Ole, On Tue, 2023-08-29 at 20:56 +0200, Ole Holm Nielsen wrote: > Yes, but can you share the details of which parameters you configure > in > this plugin so that you can extract node power? This doesn't seem > obvious to me. not much needs configuring. We have EnergyIPMIFrequency=10 EnergyIPMICalcAdjustment=yes EnergyIPMIPowerSensors=Node=DCMI_ENHANCED in the acct_gather.conf. Regards magnus -- Magnus Hagdorn Charité – Universitätsmedizin Berlin Geschäftsbereich IT | Scientific Computing Campus Charité Virchow Klinikum Forum 4 | Ebene 02 | Raum 2.020 Augustenburger Platz 1 13353 Berlin magnus.hagd...@charite.de https://www.charite.de HPC Helpdesk: sc-hpc-helpd...@charite.de smime.p7s Description: S/MIME cryptographic signature
Re: [slurm-users] [ext] Re: bufferoverflow in slurmd with acct_gather_energy plugin
Hi Magnus, On 29-08-2023 13:56, Hagdorn, Magnus Karl Moritz wrote: I'm curious to learn about your energy gathering method: How do you extract node power using IPMI using FreeIMPI (or some other toolset), and how do you configure Slurm for this? We are using the SLURM plugin which is enabled using AcctGatherEnergyType=acct_gather_energy/ipmi https://slurm.schedmd.com/acct_gather.conf.html#SECTION_acct_gather_energy/IPMI Yes, but can you share the details of which parameters you configure in this plugin so that you can extract node power? This doesn't seem obvious to me. Thanks, Ole
Re: [slurm-users] [ext] Re: bufferoverflow in slurmd with acct_gather_energy plugin
Hi Ole, On Tue, 2023-08-29 at 11:08 +0200, Ole Holm Nielsen wrote: > > I'm curious to learn about your energy gathering method: How do you > extract node power using IPMI using FreeIMPI (or some other toolset), > and > how do you configure Slurm for this? > We are using the SLURM plugin which is enabled using AcctGatherEnergyType=acct_gather_energy/ipmi https://slurm.schedmd.com/acct_gather.conf.html#SECTION_acct_gather_energy/IPMI Cheers magnus -- Magnus Hagdorn Charité – Universitätsmedizin Berlin Geschäftsbereich IT | Scientific Computing Campus Charité Virchow Klinikum Forum 4 | Ebene 02 | Raum 2.020 Augustenburger Platz 1 13353 Berlin magnus.hagd...@charite.de https://www.charite.de HPC Helpdesk: sc-hpc-helpd...@charite.de smime.p7s Description: S/MIME cryptographic signature