On Sun, 8 May 2011 at 09:58 -0000, Dave Love wrote:

> > When you get to proper accounting, these details are very
> > important. Being sure these edges don't duplicate or miss jobs and
> > doing the desired processing for jobs splitting reporting
> > boundaries is another
>
> Why don't the intermediate accounting records cover that?

Didn't know about them.  Are they in the accounting file or just the
reporting file?  Mostly rhetorical, when it's time for doing 'real'
accounting much more study will be needed.

> Is it worth worrying about any real accuracy in accounting?  Tyring
> to keep track of everything relevant, like system problems that
> clobber jobs, just doesn't seem worthwhile to me.

This is a very important basic question and strongly it interacts with
the accuracy of the collected information.  I think that as a practical
matter, the data collected by SGE is not good enough for precise use.
It misses any system usage outside of SGE (and depending heavily on
details I have not fully explored can miss internal usage).

As a basic metric suitable for basic/casual reporting I think the
available data is fine.

> > I may eventually do my own processing of the accounting records.
>
> There are assorted scripts floating around for doing that, including
> analyze.rb (?) in the distribution.  Note that they need
> consideration of the configuration in general, such as the one
> slot/per node loosely integrated parallel jobs that were originally
> configured here...

Some basic tools are very useful (even just as maintained examples).
For me qacct is perfectly fine.  Something that can do a few basic
pictures would be useful.

I do assume each organization will need some customization (at a
minimum someone will want some color changed)

Do you mean applying compensation for over committed nodes and
dedicated nodes with only a single SGE job (but maybe using multiple
cores)?

This is where site specific needs come into play.

So far, I've only been thinking/reporting in wall clock core-hours
used.  It will get more complicated if we allow dedicated nodes or
when we start working with memory size requests.

It can also get complicated when nodes have different characteristics
(processor speed, software licenses, etc).

> > At some point I may look again at the ARCo stuff in sge. The ARCo
> > data base may be useful to summarize data (including the extra
> > data in the reporting file).  To me the java web interface looks
> > very heavy and would require a lot of work to get through a
> > security review (multiple web ports opened, new login methods,
> > etc).
>
> I think everyone agrees it needs replacing, if something like it is
> required.

The problem is that it already exists and people will find references
to it and say "install it".  The good news is that it was only prebult
for the Sun licensed version (as I understand it).

> > Another issue that we have is that we also run torque/Moab on
> > another cluster and I really want to normalize any formal
> > reporting system to include information from both clusters.
>
> Does Torque/Moab do something similar to GE or not?

Yes, it's similar but different.  Torque generates an accounting file
of ASCII job records with different attributes.  Torque does not have
"parallel environments", job slots are handled differently, jobs ids
are different.  I haven't made a full reconciliation but think that
most of the important information is similar.

Moab has another set of accounting information.  I haven't looked at
how that compares to the Torque information.

Basic common info should be easy: User, group, department, wall clock
time, parallel multiplier (slots, nodes, cores, what have you).

> Probably not relevant to you, but for general info:  That sort of
> thing came up on a UK National Grid Service list.  I tried to
> explain that it wasn't straightforward for SGE, and it didn't seem
> that there could be a general recipe for incorporating arbitrary SGE
> configs.  It wasn't clear to me whether that's because Torque isn't
> as flexible, or whether the same issues are really there too.

Probably mostly just different with a few things that don't map well
between systems.  Also, how requested resources are handled is
probably fairly different.

> > The accounting and reporting logs look like they can be easily
> > converted into a data set suitable for analysis with R.  Usage
> > summaries by various parameters are probably pretty easy.
>
> Definitely doing using a general tool seems the Right Thing.  I
> assume there are generic packages for producing pie charts, and
> other things management might understand, from database records.
> Does anyone have specific suggestions?

I used qacct to generate some basic .csv files.  Someone else used
excel (or something else) to make graphs for a management report.

We will probably also need to do a web page showing some sort of
historical reporting.  This will need to integrate into our web
content management system.

Stuart
-- 
I've never been lost; I was once bewildered for three days, but never lost!
                                        --  Daniel Boone
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to