Re: [zones-discuss] Zone Statistics: monitoring resource use of zones

2008-11-21 Thread Jeff Victor
On Mon, Nov 17, 2008 at 10:33 PM, Mike Gerdts [EMAIL PROTECTED] wrote:
 On Mon, Nov 17, 2008 at 7:44 PM, Jeff Victor [EMAIL PROTECTED] wrote:
 Hi Kevin,

 I believe that you cannot patch your way from U1 to U5 - i.e. that the
 system is missing some functionality that would be there if you had
 applied the updates - but your point is still valid. I will look into
 the correctness of using patch levels to detect feature availability.

 Huh?  There are very few features delivered in Solaris updates that
 aren't delivered via patches.  So few that I can only think of one
 time where it has made a difference (postgres version different
 between updates).  When really important features are released as new
 packages genesis patches are delivered to deliver the feature.  This
 is how the U1 + patches system below has zfs on it even though zfs
 didn't come out until U2.

Hoping to summarize this sub-thread:

A patch can only modify an existing package. An update can have new
packages as well as patches to existing packages. In general, you
can't patch your way to, or past, an update which has new packages.
There have been times when an empty package was placed into an update
in an attempt to make it possible to add functionality later simply by
adding a patch.

Proof by blog is hardly sufficient, but
http://blogs.sun.com/patch/entry/solaris_10_5_08_update provides an
example:

The Solaris 10 05/08 Patch Bundle contains the equivalent set of
patches to the Solaris 10 05/08 (Update 5) release. The patch bundle
does not include the new packages contained in the Solaris 10 05/08
(Update 5) release.  Therefore, new features in Update 5 which depend
upon new packages introduced in that release will not be available in
the patch bundle.

Moving forward:

That raises several questions: are new pkgs added often? (400 were
added after S10 3/05 so far.) Do those packages add new features? (I
think that's a safe assumption, but I don't know of a mapping from
feature to pkg.) Are any of those features used by zonestat.pl? (I
don't know of any, so it's likely that you can patch your way from S10
FCS to all of zonestat works even though the system wouldn't have
all of the features in U5.)

In any case, it became clear early in this thread that checking
/etc/release was inadequate, and so the ToDo for v1.3 includes fixing
this. Sample code - from this community - to check for each of the
necessary features added during the life of S10 would be greatly
appreciated... Rules and ideas for contributing code can be found at
http://www.opensolaris.org/os/communities/participation/ .


 All of the functionality that this script cares about for this comes
 as part of the recommended patch set.  Consider this system:

 # cat /etc/release
   Solaris 10 1/06 s10s_u1wos_19a SPARC
   Copyright 2005 Sun Microsystems, Inc.  All Rights Reserved.
Use is subject to license terms.
   Assembled 07 December 2005

 # uname -rv
 5.10 Generic_127111-09

 That puts it somewhere in between U4 and U5 for kernel patches.
 Because the recommended bundle was used, it puts it somewhere in
 between for other aspects (e.g. libzonecfg, etc.) as well.  Let's take
 a look at the checks that zonestat does for updates:

   356  # For zones with RAM caps (U4+), get current values for RAM
 usage and Cap.
   357  if ($update3) {
   358open (RCAP, /usr/bin/svcs -H rcap|);

 # svcs -H rcap
 disabled   May_03   svc:/system/rcap:default

 Exists but disabled.

   440  if ($update4) {
   441open(PRCTL, /bin/prctl -Pi zone -n zone.cpu-cap $z|);
   442while (PRCTL) {

 Not at update 5's kernel and related patch set yet, so I wouldn't
 expect that this would work.  However, let's take a look at another
 system that was installed with update 4 but has update 5+ patches.

 # cat /etc/release
   Solaris 10 8/07 s10s_u4wos_12b SPARC
   Copyright 2007 Sun Microsystems, Inc.  All Rights Reserved.
Use is subject to license terms.
Assembled 16 August 2007

 # uname -rv
 5.10 Generic_137111-08

 # prctl -Pi zone -n zone.cpu-cap 
 zone: 3: 
 zone.cpu-cap system 4294967295 inf deny -

 --


-- 
--JeffV
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Zone Statistics: monitoring resource use of zones

2008-11-18 Thread Glenn Brunette

Mike,

Mike Gerdts wrote:
 On Mon, Nov 17, 2008 at 8:05 PM, Glenn Brunette [EMAIL PROTECTED] wrote:
 Jeff,

 This actually hits on a similar request that I have (but for different
 reasons).  I would like a stable interface from which I could tell
 the update revision of a system.
 
 This seems to be another case for feature-based meta packages.
 
 http://mgerdts.blogspot.com/2008/03/solaris-wish-list-feature-based-meta.html
 
 I describe it for the simplicity of installing software, but with a
 bit of thought it could be possible to extend it to this use as well.

Very similiar indeed.  While this may work with the big items that
qualify as features, I am not sure if this would be stretching the
metaphor a bit for smaller components, but you and I are definitely
thinking in a similar vein.

 In a past life working on JASS, we were told to not test for patch or
 update levels but rather to test whether a specific feature is present,
 and while I understand the merits of this methodology, it does not
 always provide a complete solution (without making significant
 assumptions about how the system was installed/maintained).  For
 
 As a very heavy user of JASS, this methodology is appreciated.  It has
 made the software continue to be quite useful long after Sun stopped
 providing updates.  (Any news on open sourcing it?)

Thank you!  As far as news on JASS, all that I can say for now is
stay tuned.  There is a lot of discussion happening on this front
these days, and I hope we will have some news to share soon.

g
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Zone Statistics: monitoring resource use of zones

2008-11-18 Thread James Carlson
Glenn Brunette writes:
 This actually hits on a similar request that I have (but for different
 reasons).  I would like a stable interface from which I could tell
 the update revision of a system.

We have no such thing.  It's not clear to me how such a thing would
work.  Suppose someone installs only the KJP corresponding to U5 on a
U4 system -- is that now U5 or U4 or U4++ or something else entirely?
If that returns U5, then suppose someone installs a U5 patch not
dependent on the KJP onto a U4 system.  Is that still U4?

What determines U5-ness?

If it's dependent on the upgrade process itself, and none of the above
would return the answer U5, then suppose someone installs all of the
patches for U5 and then installs/removes packages to make the system
equivalent to one that had been upgraded.  Is that now U5 or is it
still something else?

Does it make any sense that you can have arbitrary (and improper)
subsets of bits on the system and yet you're insisting on returning an
effectively scalar result?

 I have a very large government customer who (as part of their security
 configuration hardening and assessment) process have a very real need
 to detect OS version and update levels so that they can determine which
 actions/checks to apply.

You can get the OS version from uname and the list of patches
installed from patchadd.

 assumptions about how the system was installed/maintained).  For
 example, is the feature not present or has it been removed or simply
 not installed?

Is there some difference between those things?  That sounds like the
realm of metaphysics to me ... if bits aren't present, the why
question seems much less interesting.

How can the system necessarily know what features _could_ potentially
be installed but aren't there?  Isn't that everything?  If you've
installed something and then removed it, would that be different from
never having installed it in the first place?  (If it is, doesn't that
indicate a bug in the removal process?)

Perhaps most importantly: how can you use that information?  What
would you do differently if something had once been installed that you
wouldn't do if it had never been installed?

  Also, the existence of some features also can not be
 easily tested using automated tools without imposing a great burden
 on the tool developer.

That sounds like a bug that should be fixed.

-- 
James Carlson, Solaris Networking  [EMAIL PROTECTED]
Sun Microsystems / 35 Network Drive71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Zone Statistics: monitoring resource use of zones

2008-11-18 Thread Jordan Brown
Mike Gerdts wrote:
 When really important features are released as new
 packages genesis patches are delivered to deliver the feature.

Sometimes, but not always.  In fact, I'd have to say usually not.  The 
Genesis technique is not without its problems, and is considered 
controversial.
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Zone Statistics: monitoring resource use of zones

2008-11-18 Thread Enda O'Connor
Hi
I'd agree with James, the update revision is sometimes a blurry picture,

ie
cat /etc/release will tell me that my system is 1/06 ( update 1 if I 
remember correctly )
but if I  have applied the latest jumbo kernel patch 137137-09 I 
essentially have a lot of the u6 functionality ( a lot but not all .. )
so is the system u1 or u6?

I tend to say it's u1 patched to u6 kernel, so as to give some idea of 
the start point and current point, but other than
cat /etc/release ( the starting point )
uname -a to give current KU level
and ls -tr1 /var/sadm/patch to see patches applied besides current KU 
plus patchadd -p to just get every patch including patches that are part 
of the update build.

So sometimes an update might be meaningless, ie
I can have an x86 FCS system ( from cat /etc/release )
but it has grub,zfs and all the latest zones functionality, just by 
adding 137137-09, plus the near 30 patches requires to get that on board.

To me they probably need a patch automation tool to tell them what is 
currently available in terms of patching, and they see what they need 
from that.

ie pca -l missing
or the like, pca being a solaris patch automation tool from
http://www.par.univie.ac.at/solaris/pca/

Enda




On 11/18/08 13:58, James Carlson wrote:
 Glenn Brunette writes:
 This actually hits on a similar request that I have (but for different
 reasons).  I would like a stable interface from which I could tell
 the update revision of a system.
 
 We have no such thing.  It's not clear to me how such a thing would
 work.  Suppose someone installs only the KJP corresponding to U5 on a
 U4 system -- is that now U5 or U4 or U4++ or something else entirely?
 If that returns U5, then suppose someone installs a U5 patch not
 dependent on the KJP onto a U4 system.  Is that still U4?
 
 What determines U5-ness?
 
 If it's dependent on the upgrade process itself, and none of the above
 would return the answer U5, then suppose someone installs all of the
 patches for U5 and then installs/removes packages to make the system
 equivalent to one that had been upgraded.  Is that now U5 or is it
 still something else?
 
 Does it make any sense that you can have arbitrary (and improper)
 subsets of bits on the system and yet you're insisting on returning an
 effectively scalar result?
 
 I have a very large government customer who (as part of their security
 configuration hardening and assessment) process have a very real need
 to detect OS version and update levels so that they can determine which
 actions/checks to apply.
 
 You can get the OS version from uname and the list of patches
 installed from patchadd.
 
 assumptions about how the system was installed/maintained).  For
 example, is the feature not present or has it been removed or simply
 not installed?
 
 Is there some difference between those things?  That sounds like the
 realm of metaphysics to me ... if bits aren't present, the why
 question seems much less interesting.
 
 How can the system necessarily know what features _could_ potentially
 be installed but aren't there?  Isn't that everything?  If you've
 installed something and then removed it, would that be different from
 never having installed it in the first place?  (If it is, doesn't that
 indicate a bug in the removal process?)
 
 Perhaps most importantly: how can you use that information?  What
 would you do differently if something had once been installed that you
 wouldn't do if it had never been installed?
 
  Also, the existence of some features also can not be
 easily tested using automated tools without imposing a great burden
 on the tool developer.
 
 That sounds like a bug that should be fixed.
 


-- 
Enda O'Connor x19781  Software Product Engineering
Patch System Test : Ireland : x19781/353-1-8199718
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Zone Statistics: monitoring resource use of zones

2008-11-18 Thread Mike Gerdts
On Tue, Nov 18, 2008 at 11:30 AM, Enda O'Connor [EMAIL PROTECTED] wrote:
 So sometimes an update might be meaningless, ie
 I can have an x86 FCS system ( from cat /etc/release )
 but it has grub,zfs and all the latest zones functionality, just by
 adding 137137-09, plus the near 30 patches requires to get that on board.

 To me they probably need a patch automation tool to tell them what is
 currently available in terms of patching, and they see what they need
 from that.

Interface changes always have an associated ARC case ID with them,
right?  Why not make it so that the software that delivers an
interface delivers some metadata that says that the interface
specified in the ARC case is on the system.  Any time a patch or
installation delivers, removes, or deprecates a feature (interface)
this metadata would get updated.  A stable interface is then needed to
query and update that metadata.

The important (worthy of marketing or release notes attention) could
get a corresponding feature-based meta package, allowing
administrators to easily install the feature.

http://mgerdts.blogspot.com/2008/03/solaris-wish-list-feature-based-meta.html

A nice extension on that would be a means to for software to register
as a consumer of the interface.  Perhaps that is just a soft
dependency in the packaging software.  Another thought would be to add
dtrace probes at the entry points to the interfaces so that a
interface watch daemon could track interface users (e.g. by using
process contract decorations).

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Zone Statistics: monitoring resource use of zones

2008-11-18 Thread James Carlson
Mike Gerdts writes:
 On Tue, Nov 18, 2008 at 11:30 AM, Enda O'Connor [EMAIL PROTECTED] wrote:
  To me they probably need a patch automation tool to tell them what is
  currently available in terms of patching, and they see what they need
  from that.
 
 Interface changes always have an associated ARC case ID with them,
 right?

It depends.

Many changes involving Project Private and Consolidation Private
interfaces have no ARC involvement at all.  For other projects, we
review them even when there are *no* interface changes, because the
projects have architectural features worth reviewing.

ARC review isn't just about interfaces, and in some cases the consumer
may want to know something more like is the fix for known problem
Foobar installed?

  Why not make it so that the software that delivers an
 interface delivers some metadata that says that the interface
 specified in the ARC case is on the system.  Any time a patch or
 installation delivers, removes, or deprecates a feature (interface)
 this metadata would get updated.  A stable interface is then needed to
 query and update that metadata.

If I read that correctly, the query could be something like:

if isinstalled PSARC/2008/123; then
 ...
fi

If so, then that's a bit weird, but I guess it could be made to work.
It's not clear to me whether that's as useful as providing domain
specific feature tests.

It'd be better to start with a clear set of requirements and work down
to an implementation, I think.

-- 
James Carlson, Solaris Networking  [EMAIL PROTECTED]
Sun Microsystems / 35 Network Drive71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Zone Statistics: monitoring resource use of zones

2008-11-17 Thread Young, Kevin
Jeff,
I am wondering about the logic in how the script identifies specific
versions. It appears that you are looking at /etc/release to define
this.  This seems to limit some features of your script because I have a
Solaris 10 update 1 system that has been updated to 05/08 (update 5) but
/etc/release still reflects update 1 (updated using 05/08 patch bundle).

I am using CPU caps but your tool doesn't recognize that I have that
feature available. Since these features really come from the kernel
version, would that be a better way to identify release version in your
script; Just a thought.

In the meantime I tricked the script to think I am on update 5 and I am
getting better results.


-= Kevin =-


-Original Message-
From: Jeff Victor [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 10, 2008 9:01 AM
To: Young, Kevin
Cc: zones-discuss@opensolaris.org
Subject: Re: [zones-discuss] Zone Statistics: monitoring resource use of
zones

On Mon, Nov 10, 2008 at 11:21 AM, Young, Kevin [EMAIL PROTECTED]
wrote:
 I am curious if you have plans to make it Solaris 10 compatible.

I do all development on Solaris 10. The script makes an effort to
distinguish between the different capabilities of the different
Solaris 10 updates.


 -Original Message-
 From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jeff Victor
 Sent: Sunday, November 09, 2008 5:54 PM
 To: zones-discuss@opensolaris.org
 Subject: [zones-discuss] Zone Statistics: monitoring resource use of
zones

 It has become clear that there is a need to monitor resource
 consumption of workloads in zones, and an easy method to compare
 consumption to resource controls. In order to understand how a
 software tool could fulfill this need, I created an OpenSolaris
 project and a prototype to get started. If this sounds interesting,
 you can find the project and Perl script at:
 http://opensolaris.org/os/project/zonestat/ .

 If you have any comments, or suggestions for improvement, please let
 me know on this e-mail list or via private e-mail.

-- 
--JeffV

___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Zone Statistics: monitoring resource use of zones

2008-11-17 Thread Jeff Victor
On Sun, Nov 16, 2008 at 10:58 PM, Mike Gerdts [EMAIL PROTECTED] wrote:
 On Sun, Nov 16, 2008 at 7:40 PM, Jeff Victor [EMAIL PROTECTED] wrote:
 To me, the clearest example would be a kstat, per zone, which provides
 the total amount of CPU time for all of the processes in each zone,
 since the zone booted. This would enable tools like zonestat to
 request the datum occasionally, in order to determine CPU time per
 quantum of elapsed time.

 zonestat shouldn't be needed to give this information.

Of course. I guess I wasn't clear. I was trying to say the clearest
example of a kstat that is needed is a kstat, per zone... That kstat
could then be used by many *stat tools, including zonestat, prstat,
etc.

 Per zone, project, and user data should be available that allows prstat to
 display this information.  When I use prstat -mz or prstat -ma, I
 would expect the collected microstate accounting data would be used to
 populate the display.  Other fine points about this include:

 - Currently prstat shows time decayed summaries in the bottom panel, even 
 when microstate data is displayed at the top.  Time decayed data
 is confusing, particularly when trying to correlate application events that 
 last just several seconds to CPU consumption.

Not only is it confusing, it can be very wrong, e.g. if there are many
short-lived processes that come and go between the snapshots that
prstat takes. That's why a kstat like the one described above is
needed.


--JeffV
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Zone Statistics: monitoring resource use of zones

2008-11-17 Thread Glenn Brunette

Jeff,

This actually hits on a similar request that I have (but for different
reasons).  I would like a stable interface from which I could tell
the update revision of a system.

I have a very large government customer who (as part of their security
configuration hardening and assessment) process have a very real need
to detect OS version and update levels so that they can determine which
actions/checks to apply.

In a past life working on JASS, we were told to not test for patch or
update levels but rather to test whether a specific feature is present,
and while I understand the merits of this methodology, it does not
always provide a complete solution (without making significant
assumptions about how the system was installed/maintained).  For
example, is the feature not present or has it been removed or simply
not installed?  Also, the existence of some features also can not be
easily tested using automated tools without imposing a great burden
on the tool developer.

It sounds like you may be in a similar boat.  What do you think?
Cross-posting to security-discuss to get their feedback as well.

g


Jeff Victor wrote:
 Hi Kevin,
 
 I believe that you cannot patch your way from U1 to U5 - i.e. that the
 system is missing some functionality that would be there if you had
 applied the updates - but your point is still valid. I will look into
 the correctness of using patch levels to detect feature availability.
 
 On Mon, Nov 17, 2008 at 6:09 PM, Young, Kevin [EMAIL PROTECTED] wrote:
 Jeff,
 I am wondering about the logic in how the script identifies specific
 versions. It appears that you are looking at /etc/release to define
 this.  This seems to limit some features of your script because I have a
 Solaris 10 update 1 system that has been updated to 05/08 (update 5) but
 /etc/release still reflects update 1 (updated using 05/08 patch bundle).

 I am using CPU caps but your tool doesn't recognize that I have that
 feature available. Since these features really come from the kernel
 version, would that be a better way to identify release version in your
 script; Just a thought.

 In the meantime I tricked the script to think I am on update 5 and I am
 getting better results.


 -= Kevin =-


 -Original Message-
 From: Jeff Victor [mailto:[EMAIL PROTECTED]
 Sent: Monday, November 10, 2008 9:01 AM
 To: Young, Kevin
 Cc: zones-discuss@opensolaris.org
 Subject: Re: [zones-discuss] Zone Statistics: monitoring resource use of
 zones

 On Mon, Nov 10, 2008 at 11:21 AM, Young, Kevin [EMAIL PROTECTED]
 wrote:
 I am curious if you have plans to make it Solaris 10 compatible.
 I do all development on Solaris 10. The script makes an effort to
 distinguish between the different capabilities of the different
 Solaris 10 updates.


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Jeff Victor
 Sent: Sunday, November 09, 2008 5:54 PM
 To: zones-discuss@opensolaris.org
 Subject: [zones-discuss] Zone Statistics: monitoring resource use of
 zones
 It has become clear that there is a need to monitor resource
 consumption of workloads in zones, and an easy method to compare
 consumption to resource controls. In order to understand how a
 software tool could fulfill this need, I created an OpenSolaris
 project and a prototype to get started. If this sounds interesting,
 you can find the project and Perl script at:
 http://opensolaris.org/os/project/zonestat/ .

 If you have any comments, or suggestions for improvement, please let
 me know on this e-mail list or via private e-mail.
 
 

-- 
Glenn Brunette
Distinguished Engineer
Director, GSS Security Office
Sun Microsystems, Inc.
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Zone Statistics: monitoring resource use of zones

2008-11-17 Thread Mike Gerdts
On Mon, Nov 17, 2008 at 7:44 PM, Jeff Victor [EMAIL PROTECTED] wrote:
 Hi Kevin,

 I believe that you cannot patch your way from U1 to U5 - i.e. that the
 system is missing some functionality that would be there if you had
 applied the updates - but your point is still valid. I will look into
 the correctness of using patch levels to detect feature availability.

Huh?  There are very few features delivered in Solaris updates that
aren't delivered via patches.  So few that I can only think of one
time where it has made a difference (postgres version different
between updates).  When really important features are released as new
packages genesis patches are delivered to deliver the feature.  This
is how the U1 + patches system below has zfs on it even though zfs
didn't come out until U2.

All of the functionality that this script cares about for this comes
as part of the recommended patch set.  Consider this system:

# cat /etc/release
   Solaris 10 1/06 s10s_u1wos_19a SPARC
   Copyright 2005 Sun Microsystems, Inc.  All Rights Reserved.
Use is subject to license terms.
   Assembled 07 December 2005

# uname -rv
5.10 Generic_127111-09

That puts it somewhere in between U4 and U5 for kernel patches.
Because the recommended bundle was used, it puts it somewhere in
between for other aspects (e.g. libzonecfg, etc.) as well.  Let's take
a look at the checks that zonestat does for updates:

   356  # For zones with RAM caps (U4+), get current values for RAM
usage and Cap.
   357  if ($update3) {
   358open (RCAP, /usr/bin/svcs -H rcap|);

# svcs -H rcap
disabled   May_03   svc:/system/rcap:default

Exists but disabled.

   440  if ($update4) {
   441open(PRCTL, /bin/prctl -Pi zone -n zone.cpu-cap $z|);
   442while (PRCTL) {

Not at update 5's kernel and related patch set yet, so I wouldn't
expect that this would work.  However, let's take a look at another
system that was installed with update 4 but has update 5+ patches.

# cat /etc/release
   Solaris 10 8/07 s10s_u4wos_12b SPARC
   Copyright 2007 Sun Microsystems, Inc.  All Rights Reserved.
Use is subject to license terms.
Assembled 16 August 2007

# uname -rv
5.10 Generic_137111-08

# prctl -Pi zone -n zone.cpu-cap 
zone: 3: 
zone.cpu-cap system 4294967295 inf deny -

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Zone Statistics: monitoring resource use of zones

2008-11-17 Thread Mike Gerdts
On Mon, Nov 17, 2008 at 8:05 PM, Glenn Brunette [EMAIL PROTECTED] wrote:

 Jeff,

 This actually hits on a similar request that I have (but for different
 reasons).  I would like a stable interface from which I could tell
 the update revision of a system.

This seems to be another case for feature-based meta packages.

http://mgerdts.blogspot.com/2008/03/solaris-wish-list-feature-based-meta.html

I describe it for the simplicity of installing software, but with a
bit of thought it could be possible to extend it to this use as well.

 In a past life working on JASS, we were told to not test for patch or
 update levels but rather to test whether a specific feature is present,
 and while I understand the merits of this methodology, it does not
 always provide a complete solution (without making significant
 assumptions about how the system was installed/maintained).  For

As a very heavy user of JASS, this methodology is appreciated.  It has
made the software continue to be quite useful long after Sun stopped
providing updates.  (Any news on open sourcing it?)

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Zone Statistics: monitoring resource use of zones

2008-11-16 Thread Jeff Victor
Peter,

Your statements are exactly the reason(s) I wrote this prototype.
Solaris engineering is researching this topic, and at listening as we
type... :-) They are very interested in feedback generated by the use
of this prototype.

Any specific ideas you have regarding kstats you think we need, would
be welcomed on this alias.

To me, the clearest example would be a kstat, per zone, which provides
the total amount of CPU time for all of the processes in each zone,
since the zone booted. This would enable tools like zonestat to
request the datum occasionally, in order to determine CPU time per
quantum of elapsed time.

Look for v1.3 of zonestat later this week. It uses the Perl kstats
module and improves the correctness of zone - pool mappings. Each of
these also reduce the amount of CPU time needed to collect the data it
reports.


On Fri, Nov 14, 2008 at 3:21 PM, Peter Tribble [EMAIL PROTECTED] wrote:
 On Mon, Nov 10, 2008 at 1:54 AM, Jeff Victor [EMAIL PROTECTED] wrote:
 It has become clear that there is a need to monitor resource consumption of 
 workloads in zones, and an easy method to compare
 consumption to resource controls. In order to understand how a software tool 
 could fulfill this need, I created an OpenSolaris
 project and a prototype to get started. If this sounds interesting, you can 
 find the project and Perl script at:
 http://opensolaris.org/os/project/zonestat/ .

 If you have any comments, or suggestions for improvement, please let me know 
 on this e-mail list or via private e-mail.

 That reminds me of a blog entry from a year ago:

 http://blogs.sun.com/menno/entry/resource_control_observability_using_kstats

 Just looking at zonestat.pl, it perpetrates many of the horrors I'm used to
 seeing. That's not a criticism, just additional evidence that we desperately
 need better interfaces to make getting some of this information easy. There
 are - I think - 11 different binaries you invoke to get the various
 bits of information
 you need. While some of them could be replaced by inline calls to the Kstat
 module, others clearly can't. Yet some of the information could just be stored
 in kstats, which would make getting at it much easier.

 I think what I'm saying is this: what can zonestat tell us about what 
 additional
 kstats should be kept, and what additional APIs would be useful to make 
 writing
 such utilities easier?

-- 
--JeffV
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Zone Statistics: monitoring resource use of zones

2008-11-16 Thread Mike Gerdts
On Sun, Nov 16, 2008 at 7:40 PM, Jeff Victor [EMAIL PROTECTED] wrote:
 To me, the clearest example would be a kstat, per zone, which provides
 the total amount of CPU time for all of the processes in each zone,
 since the zone booted. This would enable tools like zonestat to
 request the datum occasionally, in order to determine CPU time per
 quantum of elapsed time.

zonestat shouldn't be needed to give this information.  Per zone,
project, and user data should be available that allows prstat to
display this information.  When I use prstat -mz or prstat -ma, I
would expect the collected microstate accounting data would be used to
populate the display.  Other fine points about this include:

- Currently prstat shows time decayed summaries in the bottom panel,
even when microstate data is displayed at the top.  Time decayed data
is confusing, particularly when trying to correlate application events
that last just several seconds to CPU consumption.
- It should be able to omit per-process displays.  In this mode, it
would be able to skip the walk of every process in /proc.
- It should be able to display all zones, projects, or users.  The
display only gives the top (and optionally bottom) consumers today and
makes it useless for displaying activity of all users, projects, or
zones.

Whether this information is accessible via proc or someplace under
/system is a question I don't have a good answer for.

The next things on my list after the items listed above are:

- Give performance data per service.  A while back process contract
decorations (PSARC/2008/046) were added, which would probably be a big
help.
- There's an increasing number of kernel tasks taken care of in task
queues.  My understanding is they don't get charged to any process.
Having a way to observe the impact of these taskq tasks could help
administrators understand the relative impact of things like zfs
crypto and zfs compression.

Dtrace can give the answers above but it shouldn't be that hard for
the end user.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Zone Statistics: monitoring resource use of zones

2008-11-14 Thread Peter Tribble
On Mon, Nov 10, 2008 at 1:54 AM, Jeff Victor [EMAIL PROTECTED] wrote:
 It has become clear that there is a need to monitor resource
 consumption of workloads in zones, and an easy method to compare
 consumption to resource controls. In order to understand how a
 software tool could fulfill this need, I created an OpenSolaris
 project and a prototype to get started. If this sounds interesting,
 you can find the project and Perl script at:
 http://opensolaris.org/os/project/zonestat/ .

 If you have any comments, or suggestions for improvement, please let
 me know on this e-mail list or via private e-mail.

That reminds me of a blog entry from a year ago:

http://blogs.sun.com/menno/entry/resource_control_observability_using_kstats

Just looking at zonestat.pl, it perpetrates many of the horrors I'm used to
seeing. That's not a criticism, just additional evidence that we desperately
need better interfaces to make getting some of this information easy. There
are - I think - 11 different binaries you invoke to get the various
bits of information
you need. While some of them could be replaced by inline calls to the Kstat
module, others clearly can't. Yet some of the information could just be stored
in kstats, which would make getting at it much easier.

I think what I'm saying is this: what can zonestat tell us about what additional
kstats should be kept, and what additional APIs would be useful to make writing
such utilities easier?

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Zone Statistics: monitoring resource use of zones

2008-11-10 Thread Jeff Victor
On Mon, Nov 10, 2008 at 12:30 AM, Mike Gerdts [EMAIL PROTECTED] wrote:
 On Sun, Nov 9, 2008 at 7:54 PM, Jeff Victor [EMAIL PROTECTED] wrote:
 zonestat intro snipped

 If you have any comments, or suggestions for improvement, please let
 me know on this e-mail list or via private e-mail.

 I've had such needs for a while and have developed some tools to help my 
 organization with that.
 Unfortunately, I'm not able to share that code.  I am able to share 
 suggestions...

 I am in a habit of:

 #! /usr/bin/perl -w

 use strict;

Yes, those generated warnings when I had used them earlier. I wanted
to get the code out the door and took a couple of shortcuts to do
that. I will address the warnings soon and put those checks back in
place.

 That catches a lot of mistakes that may be masked by:

 close STDERR;

 which I never do. :)

:-) Another of the short cuts. I hope to remove those short cuts in
v1.3, which should be done this week.

 Please do not use /etc/release as a test of kernel functionality.
 Those that patch to an equivalent level as the update release have a
 similar level of functionality.  A better mechanism would be to check
 for specific kernel patches.

Great idea, I'll look into it.

 # Get amount and cap of memory locked by processes in each zone.
 # kstat -p caps:*:lockedmem_zone_* conveniently summarizes all zones for us.
 #
 open(KSTAT, /usr/bin/kstat -p caps:*:lockedmem_zone_* |);
 while (KSTAT) {

 You could just use Sun::Solaris::Kstat rather than forking another perl 
 script.

Yup, that was in the ToDo list: convert all uses of /usr/bin/kstat to
uses of the Kstat module. I might sneak that into v1.3 along with
significant improvements in identifying zone-project mappings.

 My feeling on capped memory is that if it becomes an issue and capped
 swap is not really close to capped memory, the over-consumptive zone
 has too high of a chance of causing horrible I/O problems for all
 zones.  That is, the cap is likely to do more harm than good.  This
 may change if swap can go onto solid state disk.  I only mention this,
 because I don't see a purpose in capping RSS, rather I cap swap.

For fast leaks and DoS attacks, I agree. The RAM cap helps with slow
leaks and temporary overconsumption of RAM.

 FWIW, I tend to use the term reserved memory instead of swap
 because that is less confusing to most people.

That's a useful perspective. If you choose the swap cap - which is
really a VM cap - so that the sum of the swap caps is less than RAM,
you have effectively implemented 'reserved memory.' (I'm ignoring RAM
usage of the global zone, which shouldn't be ignored in practice.)
But you must be careful: nothing prevents you from 'over-reserving'
memory. If you have 'reserved' all of system memory in this way, and
add a new zone with its own 'reserve,' you will have over-subscribed
memory. That might be a good thing, as long as no one is surprised if
the system starts paging.

However, the entire concept of reserved memory limits the scalability
of the system. Imagine 4 zones with swap caps of 4GB, on a system with
16GB of RAM. (Again, I'm ignoring the GZ.) Unless you allow yourself
to over-subscribe RAM, you can't add more zones, even if those 4 zones
are only using 1GB each during normal conditions.

Balance is needed. When paging must be avoided at all costs,
'reserving' memory by setting proper swap-caps makes a great deal of
sense. When paging is unlikely because the workload is well
understood, and a small amount of paging would not be horrible, and
zone 'density' is important, reserving memory would not make sense.
Many situations would call for memory 'reservations' on some zones,
and RAM caps on others.

 For CPU related stats, take a look at a discussion I started a while back:

 http://mail.opensolaris.org/pipermail/perf-discuss/2005-November/002048.html

Cool. Also, Jim Fiori had a simple idea for counting CPU time per zone
with almost no perf impact: use DTrace to implement a probe which
fires every M microseconds, and increments a per-zone counter. But
that's a short-term solution. We need a per-zone counter in the kernel
that tallies CPU time per zone.

 One project I would like to kick off sometime is doing per user, per
 project, and per zone microstate accounting.

Excellent idea. I'll watch for it! :-)

snip

 I didn't have a chance to check logic closely or run it on a test
 system.  I'll offer more feedback if needed when I get a chance to
 test it.  It is a great start and I can't wait to see it progress.

Thanks!


-- 
--JeffV
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Zone Statistics: monitoring resource use of zones

2008-11-10 Thread Young, Kevin
I am curious if you have plans to make it Solaris 10 compatible.



-= Kevin =-

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeff Victor
Sent: Sunday, November 09, 2008 5:54 PM
To: zones-discuss@opensolaris.org
Subject: [zones-discuss] Zone Statistics: monitoring resource use of zones

It has become clear that there is a need to monitor resource
consumption of workloads in zones, and an easy method to compare
consumption to resource controls. In order to understand how a
software tool could fulfill this need, I created an OpenSolaris
project and a prototype to get started. If this sounds interesting,
you can find the project and Perl script at:
http://opensolaris.org/os/project/zonestat/ .

If you have any comments, or suggestions for improvement, please let
me know on this e-mail list or via private e-mail.


--JeffV
___
zones-discuss mailing list
zones-discuss@opensolaris.org

___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Zone Statistics: monitoring resource use of zones

2008-11-10 Thread Jeff Victor
On Mon, Nov 10, 2008 at 11:21 AM, Young, Kevin [EMAIL PROTECTED] wrote:
 I am curious if you have plans to make it Solaris 10 compatible.

I do all development on Solaris 10. The script makes an effort to
distinguish between the different capabilities of the different
Solaris 10 updates.


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeff Victor
 Sent: Sunday, November 09, 2008 5:54 PM
 To: zones-discuss@opensolaris.org
 Subject: [zones-discuss] Zone Statistics: monitoring resource use of zones

 It has become clear that there is a need to monitor resource
 consumption of workloads in zones, and an easy method to compare
 consumption to resource controls. In order to understand how a
 software tool could fulfill this need, I created an OpenSolaris
 project and a prototype to get started. If this sounds interesting,
 you can find the project and Perl script at:
 http://opensolaris.org/os/project/zonestat/ .

 If you have any comments, or suggestions for improvement, please let
 me know on this e-mail list or via private e-mail.

-- 
--JeffV
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Zone Statistics: monitoring resource use of zones

2008-11-09 Thread Mike Gerdts
On Sun, Nov 9, 2008 at 7:54 PM, Jeff Victor [EMAIL PROTECTED] wrote:
 It has become clear that there is a need to monitor resource
 consumption of workloads in zones, and an easy method to compare
 consumption to resource controls. In order to understand how a
 software tool could fulfill this need, I created an OpenSolaris
 project and a prototype to get started. If this sounds interesting,
 you can find the project and Perl script at:
 http://opensolaris.org/os/project/zonestat/ .

 If you have any comments, or suggestions for improvement, please let
 me know on this e-mail list or via private e-mail.

I've had such needs for a while and have developed some tools to help
my organization with that.  Unfortunately, I'm not able to share that
code.  I am able to share suggestions...

I am in a habit of:

#! /usr/bin/perl -w

use strict;

That catches a lot of mistakes that may be masked by:

close STDERR;

which I never do. :)

Please do not use /etc/release as a test of kernel functionality.
Those that patch to an equivalent level as the update release have a
similar level of functionality.  A better mechanism would be to check
for specific kernel patches.  See
http://blogs.sun.com/patch/entry/solaris_10_kernel_patchid_progression
for kernel patch ID's through Update 6.  The blog entry was posted
just before U5 shipped, so the U6 info should be checked for accuracy.

# Get amount and cap of memory locked by processes in each zone.
# kstat -p caps:*:lockedmem_zone_* conveniently summarizes all zones for us.
#
open(KSTAT, /usr/bin/kstat -p caps:*:lockedmem_zone_* |);
while (KSTAT) {

You could just use Sun::Solaris::Kstat rather than forking another perl script.

My feeling on capped memory is that if it becomes an issue and capped
swap is not really close to capped memory, the over-consumptive zone
has too high of a chance of causing horrible I/O problems for all
zones.  That is, the cap is likely to do more harm than good.  This
may change if swap can go onto solid state disk.  I only mention this,
because I don't see a purpose in capping RSS, rather I cap swap.
FWIW, I tend to use the term reserved memory instead of swap
because that is less confusing to most people.

For CPU related stats, take a look at a discussion I started a while back:

http://mail.opensolaris.org/pipermail/perf-discuss/2005-November/002048.html

One project I would like to kick off sometime is doing per user, per
project, and per zone microstate accounting.  Presumably this data
would be available through kstat.  The tricky part here is to not
introduce a big load on the system in the process of doing this.  The
above discussion and/or others in a similar vein have led me to think
that collecting stats as processes exit and periodically through a
kernel thread would be the way to go.  This approach won't be accurate
to subsecond intervals, but generally speaking you don't need better
data than per minute.  Such a thread should have no more impact on the
system than a single user running prstat or top with a similar
interval.  Further, it would be good data for prstat (e.g. -a) to use.

A follow-on to that would be to have a way to track usage of kernel
taskq work.  As more in-kernel functionality comes into existence, it
becomes harder to see where the utilization is.  For example, a kstat
that counted the relative amount of time in zfs crypto versus zfs
gzip9 operations would be helpful to the support person that is trying
to answer the call why does vmstat say my system is pegged while
prstat shows no processes consuming CPU?  Sure, dtrace can get that
information - but it is hard for the typical person to write and
pretty expensive to run as regular monitoring.  But, now I'm a bit off
topic.


The use of mdb rules out use by most users.  I dislike tools that make
users (e.g. application owners) ask me for root access.  I haven't
looked closely to see which, if any, of the other commands also
require some elevated privileges.

Most (all?) other commands have the full path set but mdb doesn't.
Perhaps $ENV{PATH} = ... would be a good thing to add.

I didn't have a chance to check logic closely or run it on a test
system.  I'll offer more feedback if needed when I get a chance to
test it.  It is a great start and I can't wait to see it progress.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zones-discuss mailing list
zones-discuss@opensolaris.org