Re: [gridengine users] write my own accounting log parser..

Reuti Fri, 06 May 2011 10:31:48 -0700

Am 06.05.2011 um 19:17 schrieb Stuart Barkley:

> On Thu, 5 May 2011 at 16:46 -0000, William Deegan wrote:
> 
>> I'm pondering writing a python based dbwriter replacement which
>> would just parse the accounting file and stuff it in a db, and then
>> have some python web app framework for reporting.
>> 
>> Has anyone already done this?
>> 
>> Any suggestions on how to "eat" the accounting log file? (consume it
>> so it never gets big? Do I rotate it out and parse and discard?)
> 
> Yes, just roll the log file (see responses to your question on that).
> 
> I have done various simple reports with the qacct command.  With just
> a little extra work the summary information can be put into .csv files
> that someone else can use to make the necessary pie charts, graphs and
> other pretty things management likes to see.
> 
> For simple usage reporting I use something like:
> 
>    zcat /opt/sge_root/betsy/common/accounting-201* > sge-acct.tmp
>    qacct -o -b 201101010000 -e 201104010000 -f sge-acct.tmp > sge-2011Q1.rpt
>    qacct -D -g -o -pe -P -b 201101010000 -e 201104010000 -f sge-acct.tmp > 
> sge-2011Q1-full.rpt
> 
>    qacct -D -o -P -e 201104010000 -f Mirror/sge-acct.tmp | grep -v 
> '============' | perl -ne 'chomp;s/\s+/\t/g;print $_."\n";' > sge-all-time.csv
> 
> With a little over 2M total records these commands take very little
> time to execute.  We have about 30M records in the reporting files.
> As our usage ramps up I do expect the number of records/day to grow
> substantially.
> 
> At our current usage level sequential processing of the raw records
> looks perfectly usable.
> 
> Is there specific documentation about the -d and -e options on qacct
> (man qacct doesn't specify)?  Are these times inclusive or exclusive?
> What does qacct do about jobs which span the start of end time?


In `man qacct` it's stated that the start time of the job is checked - nothing 
more.

-- Reuti


> When you get to proper accounting, these details are very important.
> Being sure these edges don't duplicate or miss jobs and doing the
> desired processing for jobs splitting reporting boundaries is another
> I may eventually do my own processing of the accounting records.
> 
> At some point I may look again at the ARCo stuff in sge. The ARCo data
> base may be useful to summarize data (including the extra data in the
> reporting file).  To me the java web interface looks very heavy and
> would require a lot of work to get through a security review (multiple
> web ports opened, new login methods, etc).
> 
> Another issue that we have is that we also run torque/Moab on another
> cluster and I really want to normalize any formal reporting system to
> include information from both clusters.
> 
> I may get a summer statistics student to look at the accounting and
> reporting logs to see what sort of useful information might be gained.
> The accounting and reporting logs look like they can be easily
> converted into a data set suitable for analysis with R.  Usage
> summaries by various parameters are probably pretty easy.  Focusing on
> a few specific users and jobs might be interesting.  Has anyone
> performed any larger statistical analysis of this information?
> 
> Stuart
> -- 
> I've never been lost; I was once bewildered for three days, but never lost!
>                                        --  Daniel Boone
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] write my own accounting log parser..

Reply via email to