​Appears that the filetxt accounting just doesn't record those fields.
(The man page has a rather cryptic "Note: The filetxt plugin records only a 
limited subset of accounting information and  will  prevent  some  sacct  
options  from proper operation.​"
I did not expect something like NCPUs to be outside of a useful but limited 
subset of information...)

Assume this is similar to the situation with jobcomp/filetxt as here - 
https://bugs.schedmd.com/show_bug.cgi?id=3229

If anyone else runs into this issue, the fix appears to be that one should just 
use slurmdbd - thought I'd send this out; all the previous posts I came across 
with the same issue never had a resolution.

Cheers,


------------------------------------
Eric Coulter         jecou...@iu.edu
XSEDE Capabilities and Resource Integration Engineer
IU Campus Bridging & Research Infrastructure
RT/PTI/UITS
https://www.xsede.org/ecosystem/xcri-mission
________________________________
From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Coulter, 
John Eric <jecou...@iu.edu>
Sent: Tuesday, January 30, 2018 3:13 PM
To: slurm-users@lists.schedmd.com
Subject: [slurm-users] Slurm accounting problem - NCPUs=0


Hi All,

I've run into a strange problem with my slurm configuration. Trying to set up 
AccountingStorage properly so that I can use OpenXDMoD for producing usage 
reports, but the output I'm getting from sacct only has 0's for a huge number 
of fields like NCPUs and CPUTimeRaw (which are rather important for useage 
reports).

Has anyone here run into something similar before? It would be great if someone 
could point out what I've mis-configured. I've pasted the relevant bits of my 
slurm config and sacct output after my sig.

Thanks!


------------------------------------
Eric Coulter         jecou...@iu.edu
XSEDE Capabilities and Resource Integration Engineer
IU Campus Bridging & Research Infrastructure
RT/PTI/UITS
812-856-3250

jecoulte@headnode ~]$ scontrol show config | grep Acc
AccountingStorageBackupHost = (null)
AccountingStorageEnforce = none
AccountingStorageHost   = headnode
AccountingStorageLoc    = /var/log/slurmacct.log
AccountingStoragePort   = 0
AccountingStorageTRES   = cpu,mem,energy,node      #Added these in case the 
default wasn't being respected for some reason...
AccountingStorageType   = accounting_storage/filetxt
AccountingStorageUser   = root
AccountingStoreJobComment = Yes
AcctGatherEnergyType    = acct_gather_energy/none
AcctGatherFilesystemType = acct_gather_filesystem/none
AcctGatherInfinibandType = acct_gather_infiniband/none
AcctGatherNodeFreq      = 0 sec
AcctGatherProfileType   = acct_gather_profile/none
JobAcctGatherFrequency  = 30
JobAcctGatherType       = jobacct_gather/linux
JobAcctGatherParams     = (null)​

For a job running on 2 nodes, 1 cpu per node, sacct shows:
[jecoulte@headnode ~]$ sudo sacct -j 386 --format 
JobID,JobName,AllocNodes,TotalCPU,CPUTime,NCPUS,CPUTimeRaw,AllocCPUs
       JobID    JobName AllocNodes   TotalCPU    CPUTime      NCPUS CPUTimeRAW  
AllocCPUS
------------ ---------- ---------- ---------- ---------- ---------- ---------- 
----------
386          fact_job.+          2  00:49.345   00:00:00          0          0  
        0
386.0          hostname          2  00:00.006   00:00:00          0          0  
        0
386.1        fact-sum.g          2  00:49.338   00:00:00          0          0  
        0

For the same job, the record in AccountingStorageLoc is:
[jecoulte@headnode ~]$ grep ^386 /var/log/slurmacct.log
386 low 1517006536 1517006537 1000 1000 - - 0 fact_job.job 1 4294901759 2 
compute-[0-1] (null)
386 low 1517006536 1517006537 1000 1000 - - 0 fact_job.job 1 4294901759 2 
compute-[0-1] (null)
386 low 1517006536 1517006538 1000 1000 - - 1 0 1 0 2 2 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0.00 0 0 0.00 0 0 0.00 0 0 0.00 hostname compute-[0-1] 0 
0 0 0 (null) 4294967295
386 low 1517006536 1517006538 1000 1000 - - 1 0 3 0 2 2 0 0 6466 0 5388 0 1078 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 269148 1 236380.00 620 1 618.00 0 1 0.00 0 1 0.00 
hostname compute-[0-1] 1 1 1 1 (null) 4294967295
386 low 1517006536 1517006537 1000 1000 - - 0 fact_job.job 1 4294901759 2 
compute-[0-1] (null)
386 low 1517006536 1517006538 1000 1000 - - 1 1 1 0 2 2 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0.00 0 0 0.00 0 0 0.00 0 0 0.00 fact-sum.g compute-[0-1] 
0 0 0 0 (null) 4294967295
386 low 1517006536 1517006565 1000 1000 - - 1 1 3 0 2 2 27 49 338902 48 94477 1 
244425 0 0 0 0 0 0 0 0 0 0 0 0 0 0 269148 1 236380.00 620 1 618.00 0 1 0.00 0 1 
0.00 fact-sum.g compute-[0-1] 1 1 1 1 (null) 4294967295
386 low 1517006536 1517006537 1000 1000 - - 0 fact_job.job 1 4294901759 2 
compute-[0-1] (null)
386 low 1517006536 1517006537 1000 1000 - - 0 fact_job.job 1 4294901759 2 
compute-[0-1] (null)
386 low 1517006536 1517006565 1000 1000 - - 3 28 3 4294967295 0

Reply via email to