On Thu, Nov 29, 2012 at 03:47:03PM -0700, Danny Auble wrote:
>
> It could be reservations. Any unused time in a reservation is spread
> across those associations in the reservation. So sreport would print
> out this info while sacct would not.
The only reservation I can think of during that time was for
maintenance. That time wouldn't get associated to users, right?
> Keep in mind also if the user is in multiple accounts each association
> gets it's own line, this could also lead to what you see.
>
> sreport -v -t seconds user topusage start=11/26
> --------------------------------------------------------------------------------
> Top 10 Users 2012-11-26T00:00:00 - 2012-11-28T23:59:59 (259200 secs)
> Time reported in CPU Seconds
> --------------------------------------------------------------------------------
> Cluster Login Proper Name Account Used
> --------- --------- --------------- --------------- ----------
> snowflake da Danny Auble none 5743
> snowflake da Danny Auble test_acct 2
We do have this situation, but I would think for users in this situation
the sacct usage should always be higher. For the top user in sreport,
usage is 1214426112 (around 38.5 years), but sacct usage is only
1049661568 (around 33.3 years). I be satisfied to know that taking all
the data from sacct -X and analyzing that will give me valid usage
results, but the discrepancy between sacct usage and sreport usage makes
we wonder what is going on.
>
> Danny
>
> On 11/29/2012 01:45 PM, Andy Wettstein wrote:
> > Hi,
> >
> > I've been working on making usage reports, and I'm trying to understand
> > the difference in cpu time between sreport and sacct -X.
> >
> > The sreport user top gives these numbers for cpu time in seconds:
> >
> > # sreport -v -t seconds user topusage start=11/01 format=used
> > --------------------------------------------------------------------------------
> > Top 10 Users 2012-11-01T00:00:00 - 2012-11-28T23:59:59 (2422800 secs)
> > Time reported in CPU Seconds
> > --------------------------------------------------------------------------------
> > Used
> > ----------
> > 1214426112
> > 882260416
> > 752372859
> > 701376048
> > 669655465
> > 623742128
> > 502642776
> > 410485123
> > 345720864
> > 281312588
> >
> >
> > If I sum the cputimeraw for each of the top users from sacct, I get
> > totaly different numbers. I did something like this in the same order
> > for the users listed in the topusage in sreport:
> > # sacct -P -u $i -n --starttime 11/01 --allocations --format=cputimeraw |
> > awk '{ sum+=$1} END {print sum}'
> >
> > I get these numbers, some are higher some are lower:
> >
> > 1049661568
> > 430935200
> > 692175979
> > 925251632
> > 505953929
> > 660265868
> > 630577080
> > 458256508
> > 461598697
> > 291467484
> >
> > I've been trying to figure out what sreport is using to calculate the
> > numbers, but I haven't been able to follow what is happening. So I am
> > wondering what the sreport number represents, and if using the sacct
> > output for stats is a valid method.
> >
> > Thanks
> > Andy
> >
> >
--
andy wettstein
hpc system administrator
research computing center
university of chicago
773.702.1104