Re: [zones-discuss] Zone Statistics: monitoring resource use of zones

2008-11-16 Thread Jeff Victor
Peter,

Your statements are exactly the reason(s) I wrote this prototype.
Solaris engineering is researching this topic, and at listening as we
type... :-) They are very interested in feedback generated by the use
of this prototype.

Any specific ideas you have regarding kstats you think we need, would
be welcomed on this alias.

To me, the clearest example would be a kstat, per zone, which provides
the total amount of CPU time for all of the processes in each zone,
since the zone booted. This would enable tools like zonestat to
request the datum occasionally, in order to determine CPU time per
quantum of elapsed time.

Look for v1.3 of zonestat later this week. It uses the Perl kstats
module and improves the correctness of zone - pool mappings. Each of
these also reduce the amount of CPU time needed to collect the data it
reports.


On Fri, Nov 14, 2008 at 3:21 PM, Peter Tribble [EMAIL PROTECTED] wrote:
 On Mon, Nov 10, 2008 at 1:54 AM, Jeff Victor [EMAIL PROTECTED] wrote:
 It has become clear that there is a need to monitor resource consumption of 
 workloads in zones, and an easy method to compare
 consumption to resource controls. In order to understand how a software tool 
 could fulfill this need, I created an OpenSolaris
 project and a prototype to get started. If this sounds interesting, you can 
 find the project and Perl script at:
 http://opensolaris.org/os/project/zonestat/ .

 If you have any comments, or suggestions for improvement, please let me know 
 on this e-mail list or via private e-mail.

 That reminds me of a blog entry from a year ago:

 http://blogs.sun.com/menno/entry/resource_control_observability_using_kstats

 Just looking at zonestat.pl, it perpetrates many of the horrors I'm used to
 seeing. That's not a criticism, just additional evidence that we desperately
 need better interfaces to make getting some of this information easy. There
 are - I think - 11 different binaries you invoke to get the various
 bits of information
 you need. While some of them could be replaced by inline calls to the Kstat
 module, others clearly can't. Yet some of the information could just be stored
 in kstats, which would make getting at it much easier.

 I think what I'm saying is this: what can zonestat tell us about what 
 additional
 kstats should be kept, and what additional APIs would be useful to make 
 writing
 such utilities easier?

-- 
--JeffV
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Zone Statistics: monitoring resource use of zones

2008-11-16 Thread Mike Gerdts
On Sun, Nov 16, 2008 at 7:40 PM, Jeff Victor [EMAIL PROTECTED] wrote:
 To me, the clearest example would be a kstat, per zone, which provides
 the total amount of CPU time for all of the processes in each zone,
 since the zone booted. This would enable tools like zonestat to
 request the datum occasionally, in order to determine CPU time per
 quantum of elapsed time.

zonestat shouldn't be needed to give this information.  Per zone,
project, and user data should be available that allows prstat to
display this information.  When I use prstat -mz or prstat -ma, I
would expect the collected microstate accounting data would be used to
populate the display.  Other fine points about this include:

- Currently prstat shows time decayed summaries in the bottom panel,
even when microstate data is displayed at the top.  Time decayed data
is confusing, particularly when trying to correlate application events
that last just several seconds to CPU consumption.
- It should be able to omit per-process displays.  In this mode, it
would be able to skip the walk of every process in /proc.
- It should be able to display all zones, projects, or users.  The
display only gives the top (and optionally bottom) consumers today and
makes it useless for displaying activity of all users, projects, or
zones.

Whether this information is accessible via proc or someplace under
/system is a question I don't have a good answer for.

The next things on my list after the items listed above are:

- Give performance data per service.  A while back process contract
decorations (PSARC/2008/046) were added, which would probably be a big
help.
- There's an increasing number of kernel tasks taken care of in task
queues.  My understanding is they don't get charged to any process.
Having a way to observe the impact of these taskq tasks could help
administrators understand the relative impact of things like zfs
crypto and zfs compression.

Dtrace can give the answers above but it shouldn't be that hard for
the end user.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zones-discuss mailing list
zones-discuss@opensolaris.org