I've got a real mixture of nodes, some older than Sandybridge, and I find that IPMI does a better job of getting 'whole system' power use. I was hoping to pick up power from my GPU nodes too, but those nodes don't report via IPMI either
-- *Nathan Harper* // IT Systems Architect *e: * [email protected] // *t: * 0117 906 1104 // *m: * 07875 510891 // *w: * www.cfms.org.uk <http://www.cfms.org.uk%22> // [image: Linkedin grey icon scaled] <http://uk.linkedin.com/pub/nathan-harper/21/696/b81> CFMS Services Ltd // Bristol & Bath Science Park // Dirac Crescent // Emersons Green // Bristol // BS16 7FR [image: 4.2 CFMS_Artwork_RGB] <http://www.cfms.org.uk> ------------------------------ CFMS Services Ltd is registered in England and Wales No 05742022 - a subsidiary of CFMS Ltd CFMS Services Ltd registered office // Victoria House // 51 Victoria Street // Bristol // BS1 6AD On 10 November 2014 13:13, Antony Cleave <[email protected]> wrote: > > It turns out it wasn't configured with --with-freeipmi and it's not in the > standard path. I now have a new version with this configured properly > > However while I was confirming that I saw the options to use the Intel > RAPL api instead and that looks to be running perfectly now on all the > nodes and the pdf I have says this method should have a lower overhead. > > Is there any advantage to using the IPMI plugin over the RAPL plugin? I > have just Intel nodes here, no AMD but I do have some GPUs in a subset of > nodes. > > Thanks! > > Antony > > > On 07/11/2014 09:54, Thomas Cadeau wrote: > >> Hi Antony, >> >> Can you check freeipmi-devel is installed where you are compiling? >> If it's not in a regular path, you have to use the option >> --with-freeipmi=/path/to/freeipmi-devel when configure. >> >> Anyway, if acct_gather_energy/ipmi is not compiled, you should have a >> message on config.log saying freeipmi-devel is not here. >> >> Thomas >> >> ________________________________________ >> De : Antony Cleave [[email protected]] >> Date d'envoi : jeudi 6 novembre 2014 17:00 >> À : slurm-dev >> Objet : [slurm-dev] Struggling with configuration of >> acct_gather_energy/ipmi >> >> Hi All >> >> is there a specific configure option to build the >> acct_gather_energy/ipmi plugin? >> >> I'm using Slurm 14.03.0 and I'm trying to configure >> acct_gather_energy/ipmi with the following settings in slurm.conf >> JobAcctGatherType=jobacct_gather/linux >> JobAcctGatherFrequency=30 >> AcctGatherEnergyType=acct_gather_energy/ipmi >> >> and the following in acct_gather.conf >> EnergyIPMIFrequency=30 >> EnergyIPMICalcAdjustment=yes >> EnergyIPMIUsername=**** >> EnergyIPMIPassword=**** >> >> it seems to accept the configuration: >> >> # scontrol show config >> Configuration data as of 2014-11-06T15:34:26 >> . . . >> AcctGatherEnergyType = acct_gather_energy/ipmi >> . . . >> JobAcctGatherFrequency = 30 >> JobAcctGatherType = jobacct_gather/linux >> >> However when I run a job I get the following in the logs on the slave >> nodes: >> >> [2014-11-06T15:17:20.727] Launching batch job 1055 for UID 1000 >> [2014-11-06T15:17:20.739] Received cpu frequency information for 16 cpus >> [2014-11-06T15:17:20.743] Couldn't find the specified plugin name for >> acct_gather_energy/ipmi looking at all files >> [2014-11-06T15:17:20.766] cannot find acct_gather_energy plugin for >> acct_gather_energy/ipmi >> [2014-11-06T15:17:20.766] cannot create acct_gather_energy context for >> acct_gather_energy/ipmi >> [2014-11-06T15:17:20.768] Couldn't find the specified plugin name for >> acct_gather_energy/ipmi looking at all files >> [2014-11-06T15:17:20.769] cannot find acct_gather_energy plugin for >> acct_gather_energy/ipmi >> [2014-11-06T15:17:20.769] cannot create acct_gather_energy context for >> acct_gather_energy/ipmi >> [2014-11-06T15:17:20.770] WARNING: We will use a much slower algorithm >> with proctrack/pgid, use Proctracktype=proctrack/linuxproc or some other >> proctrack when using jobacct_gather/linux >> [2014-11-06T15:17:20.771] [1055] gres/mic unable to set OFFLOAD_DEVICES, >> no device files configured >> [2014-11-06T15:17:20.823] [1055] Couldn't find the specified plugin name >> for acct_gather_energy/ipmi looking at all files >> [2014-11-06T15:17:20.825] [1055] cannot find acct_gather_energy plugin >> for acct_gather_energy/ipmi >> [2014-11-06T15:17:20.826] [1055] cannot create acct_gather_energy >> context for acct_gather_energy/ipmi >> [2014-11-06T15:17:21.843] [1055] Couldn't find the specified plugin name >> for acct_gather_energy/ipmi looking at all files >> [2014-11-06T15:17:21.845] [1055] cannot find acct_gather_energy plugin >> for acct_gather_energy/ipmi >> [2014-11-06T15:17:21.845] [1055] cannot create acct_gather_energy >> context for acct_gather_energy/ipmi >> [2014-11-06T15:17:41.771] [1055] *** JOB 1055 CANCELLED AT >> 2014-11-06T15:17:41 *** >> [2014-11-06T15:17:41.829] [1055] sending REQUEST_COMPLETE_BATCH_SCRIPT, >> error:0 status 15 >> [2014-11-06T15:17:41.831] [1055] done with job >> >> I can get power reporting with both the ipmi-sensors and ipmi-dcmi >> commands >> >> # ipmi-sensors --non-abbreviated-units | grep Watts >> 95 | Pwr Consumption | Current | 126.00 | >> Watts | 'OK' >> >> # ipmi-dcmi --get-system-power-statistics >> Current Power : 140 Watts >> Minimum Power over sampling duration : 72 watts >> Maximum Power over sampling duration : 361 watts >> Average Power over sampling duration : 132 watts >> Time Stamp : 11/06/2014 - 12:39:20 >> Statistics reporting time period : 1270838 milliseconds >> Power Measurement : Active >> >> I'm obviously missing something but I cannot find anything more in the >> documentation >> >> Thanks >> >> Antony >> >
