Am 06.05.2011 um 19:17 schrieb Stuart Barkley: > On Thu, 5 May 2011 at 16:46 -0000, William Deegan wrote: > >> I'm pondering writing a python based dbwriter replacement which >> would just parse the accounting file and stuff it in a db, and then >> have some python web app framework for reporting. >> >> Has anyone already done this? >> >> Any suggestions on how to "eat" the accounting log file? (consume it >> so it never gets big? Do I rotate it out and parse and discard?) > > Yes, just roll the log file (see responses to your question on that). > > I have done various simple reports with the qacct command. With just > a little extra work the summary information can be put into .csv files > that someone else can use to make the necessary pie charts, graphs and > other pretty things management likes to see. > > For simple usage reporting I use something like: > > zcat /opt/sge_root/betsy/common/accounting-201* > sge-acct.tmp > qacct -o -b 201101010000 -e 201104010000 -f sge-acct.tmp > sge-2011Q1.rpt > qacct -D -g -o -pe -P -b 201101010000 -e 201104010000 -f sge-acct.tmp > > sge-2011Q1-full.rpt > > qacct -D -o -P -e 201104010000 -f Mirror/sge-acct.tmp | grep -v > '============' | perl -ne 'chomp;s/\s+/\t/g;print $_."\n";' > sge-all-time.csv > > With a little over 2M total records these commands take very little > time to execute. We have about 30M records in the reporting files. > As our usage ramps up I do expect the number of records/day to grow > substantially. > > At our current usage level sequential processing of the raw records > looks perfectly usable. > > Is there specific documentation about the -d and -e options on qacct > (man qacct doesn't specify)? Are these times inclusive or exclusive? > What does qacct do about jobs which span the start of end time?
In `man qacct` it's stated that the start time of the job is checked - nothing more. -- Reuti > When you get to proper accounting, these details are very important. > Being sure these edges don't duplicate or miss jobs and doing the > desired processing for jobs splitting reporting boundaries is another > I may eventually do my own processing of the accounting records. > > At some point I may look again at the ARCo stuff in sge. The ARCo data > base may be useful to summarize data (including the extra data in the > reporting file). To me the java web interface looks very heavy and > would require a lot of work to get through a security review (multiple > web ports opened, new login methods, etc). > > Another issue that we have is that we also run torque/Moab on another > cluster and I really want to normalize any formal reporting system to > include information from both clusters. > > I may get a summer statistics student to look at the accounting and > reporting logs to see what sort of useful information might be gained. > The accounting and reporting logs look like they can be easily > converted into a data set suitable for analysis with R. Usage > summaries by various parameters are probably pretty easy. Focusing on > a few specific users and jobs might be interesting. Has anyone > performed any larger statistical analysis of this information? > > Stuart > -- > I've never been lost; I was once bewildered for three days, but never lost! > -- Daniel Boone > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
