Appears that the filetxt accounting just doesn't record those fields. (The man page has a rather cryptic "Note: The filetxt plugin records only a limited subset of accounting information and will prevent some sacct options from proper operation." I did not expect something like NCPUs to be outside of a useful but limited subset of information...)
Assume this is similar to the situation with jobcomp/filetxt as here - https://bugs.schedmd.com/show_bug.cgi?id=3229 If anyone else runs into this issue, the fix appears to be that one should just use slurmdbd - thought I'd send this out; all the previous posts I came across with the same issue never had a resolution. Cheers, ------------------------------------ Eric Coulter jecou...@iu.edu XSEDE Capabilities and Resource Integration Engineer IU Campus Bridging & Research Infrastructure RT/PTI/UITS https://www.xsede.org/ecosystem/xcri-mission ________________________________ From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Coulter, John Eric <jecou...@iu.edu> Sent: Tuesday, January 30, 2018 3:13 PM To: slurm-users@lists.schedmd.com Subject: [slurm-users] Slurm accounting problem - NCPUs=0 Hi All, I've run into a strange problem with my slurm configuration. Trying to set up AccountingStorage properly so that I can use OpenXDMoD for producing usage reports, but the output I'm getting from sacct only has 0's for a huge number of fields like NCPUs and CPUTimeRaw (which are rather important for useage reports). Has anyone here run into something similar before? It would be great if someone could point out what I've mis-configured. I've pasted the relevant bits of my slurm config and sacct output after my sig. Thanks! ------------------------------------ Eric Coulter jecou...@iu.edu XSEDE Capabilities and Resource Integration Engineer IU Campus Bridging & Research Infrastructure RT/PTI/UITS 812-856-3250 jecoulte@headnode ~]$ scontrol show config | grep Acc AccountingStorageBackupHost = (null) AccountingStorageEnforce = none AccountingStorageHost = headnode AccountingStorageLoc = /var/log/slurmacct.log AccountingStoragePort = 0 AccountingStorageTRES = cpu,mem,energy,node #Added these in case the default wasn't being respected for some reason... AccountingStorageType = accounting_storage/filetxt AccountingStorageUser = root AccountingStoreJobComment = Yes AcctGatherEnergyType = acct_gather_energy/none AcctGatherFilesystemType = acct_gather_filesystem/none AcctGatherInfinibandType = acct_gather_infiniband/none AcctGatherNodeFreq = 0 sec AcctGatherProfileType = acct_gather_profile/none JobAcctGatherFrequency = 30 JobAcctGatherType = jobacct_gather/linux JobAcctGatherParams = (null) For a job running on 2 nodes, 1 cpu per node, sacct shows: [jecoulte@headnode ~]$ sudo sacct -j 386 --format JobID,JobName,AllocNodes,TotalCPU,CPUTime,NCPUS,CPUTimeRaw,AllocCPUs JobID JobName AllocNodes TotalCPU CPUTime NCPUS CPUTimeRAW AllocCPUS ------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- 386 fact_job.+ 2 00:49.345 00:00:00 0 0 0 386.0 hostname 2 00:00.006 00:00:00 0 0 0 386.1 fact-sum.g 2 00:49.338 00:00:00 0 0 0 For the same job, the record in AccountingStorageLoc is: [jecoulte@headnode ~]$ grep ^386 /var/log/slurmacct.log 386 low 1517006536 1517006537 1000 1000 - - 0 fact_job.job 1 4294901759 2 compute-[0-1] (null) 386 low 1517006536 1517006537 1000 1000 - - 0 fact_job.job 1 4294901759 2 compute-[0-1] (null) 386 low 1517006536 1517006538 1000 1000 - - 1 0 1 0 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00 0 0 0.00 0 0 0.00 0 0 0.00 hostname compute-[0-1] 0 0 0 0 (null) 4294967295 386 low 1517006536 1517006538 1000 1000 - - 1 0 3 0 2 2 0 0 6466 0 5388 0 1078 0 0 0 0 0 0 0 0 0 0 0 0 0 0 269148 1 236380.00 620 1 618.00 0 1 0.00 0 1 0.00 hostname compute-[0-1] 1 1 1 1 (null) 4294967295 386 low 1517006536 1517006537 1000 1000 - - 0 fact_job.job 1 4294901759 2 compute-[0-1] (null) 386 low 1517006536 1517006538 1000 1000 - - 1 1 1 0 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00 0 0 0.00 0 0 0.00 0 0 0.00 fact-sum.g compute-[0-1] 0 0 0 0 (null) 4294967295 386 low 1517006536 1517006565 1000 1000 - - 1 1 3 0 2 2 27 49 338902 48 94477 1 244425 0 0 0 0 0 0 0 0 0 0 0 0 0 0 269148 1 236380.00 620 1 618.00 0 1 0.00 0 1 0.00 fact-sum.g compute-[0-1] 1 1 1 1 (null) 4294967295 386 low 1517006536 1517006537 1000 1000 - - 0 fact_job.job 1 4294901759 2 compute-[0-1] (null) 386 low 1517006536 1517006537 1000 1000 - - 0 fact_job.job 1 4294901759 2 compute-[0-1] (null) 386 low 1517006536 1517006565 1000 1000 - - 3 28 3 4294967295 0