Re: [zones-discuss] improved zones/RM integration

2006-11-02 Thread Jerry Jelinek

Jeff Victor wrote:

Just curious: which process(es) gets "billed" for shared text pages?


Jeff,

We keep track of shared cow segments so they are not double counted.
Off the top of my head I think we just credit that to the first process
we see using the segment, however I'll let Steve chime in here since
he did the implementation of this piece.

Jerry
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] improved zones/RM integration

2006-11-02 Thread Jeff Victor

Jerry Jelinek wrote:

Mike,

Mike Gerdts wrote:


In the proposal you say:

As part of this overall project, we will be enhancing the internal
rcapd rss accounting so that rcapd will have a more accurate
measurement of the overall rss for each zone.

Does this spill over to prstat such that there may finally be a fix for:

4754856  prstat -atJTZ should count shared segments only once


Yes, we are addressing this bug as part of this work.  prstat will
be able to report an accurate rss number for processes, users, projects
and tasks as well as zones.  prstat and rcapd will use the same,
new underlying rss counting code we have developed.


Just curious: which process(es) gets "billed" for shared text pages?


--
Jeff VICTOR  Sun Microsystemsjeff.victor @ sun.com
OS AmbassadorSr. Technical Specialist
Solaris 10 Zones FAQ:http://www.opensolaris.org/os/community/zones/faq
--
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] improved zones/RM integration

2006-11-02 Thread Jerry Jelinek

Mike,

Mike Gerdts wrote:

On 6/26/06, Gerald A. Jelinek <[EMAIL PROTECTED]> wrote:

Attached is a description of a project we have been refining for
a while now.  The idea is to improve the integration of zones
with some of the existing resource management features in Solaris.


In the proposal you say:

As part of this overall project, we will be enhancing the internal
rcapd rss accounting so that rcapd will have a more accurate
measurement of the overall rss for each zone.

Does this spill over to prstat such that there may finally be a fix for:

4754856  prstat -atJTZ should count shared segments only once


Yes, we are addressing this bug as part of this work.  prstat will
be able to report an accurate rss number for processes, users, projects
and tasks as well as zones.  prstat and rcapd will use the same,
new underlying rss counting code we have developed.

Jerry
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] improved zones/RM integration

2006-11-02 Thread Mike Gerdts

On 6/26/06, Gerald A. Jelinek <[EMAIL PROTECTED]> wrote:

Attached is a description of a project we have been refining for
a while now.  The idea is to improve the integration of zones
with some of the existing resource management features in Solaris.


In the proposal you say:

As part of this overall project, we will be enhancing the internal
rcapd rss accounting so that rcapd will have a more accurate
measurement of the overall rss for each zone.

Does this spill over to prstat such that there may finally be a fix for:

4754856  prstat -atJTZ should count shared segments only once

As I am looking forward to using zfs, I am trying to figure out how I
can tell how much memory is being used aside from the zfs buffer
cache.

Mike

--
Mike Gerdts
http://mgerdts.blogspot.com/
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] improved zones/RM integration

2006-07-18 Thread David . Comay

Could we somehow work the zone name into this?  It would be nice for
e.g. poolstat(1) observability.  Otherwise the user experience is going
to be all about trying to work out what 'SUNWzone34' maps to, which
seems poor.


We need to have the name begin with SUNW or we could have collisions with
existing pools.  I supposed instead of zone{id}, it could be SUNW{zonename}
although you lose the visibility that the pool is associated with a zone.
Maybe SUNWzone_{zonename}?


Or perhaps SUNWtemp_{zonename}?

dsc
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] improved zones/RM integration

2006-07-18 Thread Jerry Jelinek

Dan,

Thanks for your detailed comments.  My responses are in-line.

Dan Price wrote:

Very belatedly, I'm just getting around to reviewing this.  Overall
I think it looks good.  Comments in-line.


1) "Hard" vs. "Soft" RM configuration within zonecfg


...

dedicated-cpu
ncpus (a positive integer or range, default value 1)
importance (a positive integer, default value 1)
max-lwps (an integer >= 100)


why >= 100?  I can envision a minimized zone where this is too many.


I picked 100 since I had a hard time getting a zone to boot with much less.
Obviously this will vary somewhat depending on the services enabled.
Is 100 really a problem as a lower limit?  Part of what we are trying to
do here is help the user configure a reasonable RM configuration, especially
if they don't know a lot about RM.  Allowing them to set a limit which
prevents the zone from booting seems bad.  However, we could also just let
them do that if 100 seems too high for some reason.  Unfortunately, it is
hard to know in advance what exact number of threads will be needed to
boot the zone.


capped-cpu
cpu-cap (a positive integer, default value 100 which
 represents 100% of one cpu)


I'm scared of this default.  To put it another way, why did you pick
100?  Should there be a value which represents infinity?  What is
the meaning of specifying 0, or is that an error?


We don't have to have a default here I guess.  I picked 100 because it seemed
to be symmetrical with the dedicated cpu case where the lower number is 1.
As far as the other values (infinity and 0) that should be covered by the
cpu-caps ARC case.  I am not sure if Andrei has finished that case yet.


max-lwps (an integer >= 100)
cpu-shares (a positive integer)
dedicated-memory
TBD - once msets [12] are completed
capped-memory
cap (a positive decimal number with optional k, m, g,
 or t as a modifier, no modifier defaults to units
 of megabytes(m), must be at least 1m)


I think this set of rules is too complex and too confusing for users--
it's weird to have the default units be larger than the smallest
available units.  Let's mandate that the user *always* specify units.


OK.


2) Temporary Pools.


...

If a dedicated-cpu (or eventually a dedicated-memory) resource is
configured for the zone, then when the zone boots zoneadmd will create
a temporary pool dedicated for the zones use.  Zoneadmd will
dynamically create a pool & pset (or eventually a mset) and assign the
number of cpus specified in zonecfg to that pset.  The temporary pool
& pset will be named 'SUNWzone{zoneid}'.


Could we somehow work the zone name into this?  It would be nice for
e.g. poolstat(1) observability.  Otherwise the user experience is going
to be all about trying to work out what 'SUNWzone34' maps to, which
seems poor.


We need to have the name begin with SUNW or we could have collisions with
existing pools.  I supposed instead of zone{id}, it could be SUNW{zonename}
although you lose the visibility that the pool is associated with a zone.
Maybe SUNWzone_{zonename}?


Zoneadmd will set the 'pset.min' and 'pset.max' pset properties, as
well as the 'pool.importance' pool property, based on the values
specified for dedicated-cpu's 'ncpus' and 'importance' properties
in zonecfg.


Is importance mandatory?  Will it have a default value?  What values can
it have?  What does it mean?  Please be a little more specific.


Yes, 1.  I will add more details referring to the pools documentation on
importance.


If the cpu (or memory) resources needed to create the temporary pool
are unavailable, zoneadmd will issue an error and the zone won't boot.

When the zone is halted, the temporary pool & pset will be destroyed.


What about during a reboot.  It seems like it'd be good to not tear
down the temporary pool during reboot, but maybe that's hard.  It would
to me seem weird if my pool was 2-4 CPUs, and I had 2, then rebooted and
had 4.


We won't destroy the pool on reboot, it is preserved.  Although it is not
a big deal right now, it will become more of an issue when we have memory
sets, so I made sure the pool is preserved across reboot.  I'll clarify that.


We will add a new boolean property ('temporary') that can exist on
pools and any resource set.  The 'temporary' property indicates that
the pool or resource set should never be committed to a static
configuration (e.g. pooladm -s) and that it should never be destroyed
when updating the dynamic configuration from a static configuration
(e.g. pooladm -c).  These t

Re: [zones-discuss] improved zones/RM integration

2006-07-17 Thread Dan Price

Very belatedly, I'm just getting around to reviewing this.  Overall
I think it looks good.  Comments in-line.

> 1) "Hard" vs. "Soft" RM configuration within zonecfg
>
...
>   dedicated-cpu
>   ncpus (a positive integer or range, default value 1)
>   importance (a positive integer, default value 1)
>   max-lwps (an integer >= 100)

why >= 100?  I can envision a minimized zone where this is too many.

>   capped-cpu
>   cpu-cap (a positive integer, default value 100 which
>represents 100% of one cpu)

I'm scared of this default.  To put it another way, why did you pick
100?  Should there be a value which represents infinity?  What is
the meaning of specifying 0, or is that an error?

>   max-lwps (an integer >= 100)
>   cpu-shares (a positive integer)
>   dedicated-memory
>   TBD - once msets [12] are completed
>   capped-memory
>   cap (a positive decimal number with optional k, m, g,
>or t as a modifier, no modifier defaults to units
>of megabytes(m), must be at least 1m)

I think this set of rules is too complex and too confusing for users--
it's weird to have the default units be larger than the smallest
available units.  Let's mandate that the user *always* specify units.

> 
> 2) Temporary Pools.
> 
...
>   If a dedicated-cpu (or eventually a dedicated-memory) resource is
>   configured for the zone, then when the zone boots zoneadmd will create
>   a temporary pool dedicated for the zones use.  Zoneadmd will
>   dynamically create a pool & pset (or eventually a mset) and assign the
>   number of cpus specified in zonecfg to that pset.  The temporary pool
>   & pset will be named 'SUNWzone{zoneid}'.

Could we somehow work the zone name into this?  It would be nice for
e.g. poolstat(1) observability.  Otherwise the user experience is going
to be all about trying to work out what 'SUNWzone34' maps to, which
seems poor.

>   Zoneadmd will set the 'pset.min' and 'pset.max' pset properties, as
>   well as the 'pool.importance' pool property, based on the values
>   specified for dedicated-cpu's 'ncpus' and 'importance' properties
>   in zonecfg.

Is importance mandatory?  Will it have a default value?  What values can
it have?  What does it mean?  Please be a little more specific.

>   If the cpu (or memory) resources needed to create the temporary pool
>   are unavailable, zoneadmd will issue an error and the zone won't boot.
> 
>   When the zone is halted, the temporary pool & pset will be destroyed.

What about during a reboot.  It seems like it'd be good to not tear
down the temporary pool during reboot, but maybe that's hard.  It would
to me seem weird if my pool was 2-4 CPUs, and I had 2, then rebooted and
had 4.

>   We will add a new boolean property ('temporary') that can exist on
>   pools and any resource set.  The 'temporary' property indicates that
>   the pool or resource set should never be committed to a static
>   configuration (e.g. pooladm -s) and that it should never be destroyed
>   when updating the dynamic configuration from a static configuration
>   (e.g. pooladm -c).  These temporary pools/resources can only be managed
>   in the dynamic configuration.  These changes will be implemented within
>   libpool(3LIB).
> 
>   It is our expectation that most users will never need to manage
>   temporary pools through the existing poolcfg(1M) commands.  For users
>   who need more sophisticated pool configuration and management, the
>   existing 'pool' resource within zonecfg should be used and users
>   should manually create a permanent pool using the existing mechanisms.
> 
> 3) Resource controls in zonecfg will be simplified.
...
>   Here are the aliases we will define for the rctls:
>   alias   rctl
>   -   
>   max-lwpszone.max-lwps
>   cpu-shares  zone.cpu-shares
>   cpu-cap zone.cpu-cap (future, once cpu-caps integrate)

You've mentioned that you will substitute in sort of "the right"
defaults for the privileged and action fields.  It seems like you should
spell out what those will be...

alias rctl
--
cpu-shares=X  zone.cpu-shares(privileged, X, none)
 ...

>   If an rctl was already defined that did not match the expected value
>   (e.g. it had 'action=none' or multiple values), then the 'max-lwps'
>   alias will be disabled.  An attempt to set 'max-lwps' within
>   'dedicated-cpu' would print the following error:
>   "One or more incompatible rctls already exist for thi

Re: [zones-discuss] improved zones/RM integration

2006-06-28 Thread Jerry Jelinek

Amol,

Thanks for your comments.  I have some responses in-line.

Amol A Chiplunkar wrote:


These are very exciting features !!

Some comments.

If a dedicated-cpu (or eventually a dedicated-memory) resource is
configured for the zone, then when the zone boots zoneadmd will create
a temporary pool dedicated for the zones use.

   The temp pool created is going to show up in pooladm output, right ?
   If someone uses such a pool in a zone configuration done in traditional
   style. ( set pool=SUNWzone{zoneid} ) is zonecfg going to reject it ?
   If not, it won't be "dedicated" anymore.


That is a good point.  We will make sure that zonecfg disallows that and I
will clarify that in the proposal.  One thing I wanted to point out is that
the name of the pool/pset will actually change when you reboot the zone since
the zoneid will change at that time.  There is no easy way to predict what
the name will be or if any particular name will exist, since the zoneid is
fairly dynamic.  We are not going too far out of our way to try to prevent
you from using the temp. pool for something else but I will make sure zonecfg
doesn't allow that.


   Also, is there something that's going to stop poolbind that may move
   zones to and from temp pools to permanent pools ?
   What if a zone was created as a result of which a temp pool was created,
   and poolbind moves it to a permanent pool ? The temp pool is deleted ?


That is a good question.  I need to think about what we should do there
but I am inclined to say that we should disallow that.  Once you start
allowing stuff like that, then we are back to the problem of where the
configuration data is stored and managed and that is what we are trying to
avoid with the whole idea of the temporary pool.


Resource controls in zonecfg will be simplified.
   Do you think we need prctl enhancements that will allow setting rctls
   using the new aliases directly ? that would be good for consistency.


That is another good idea.  I'd like to keep that separate for now but
we'll keep it on our plate.


   Also you mention that the backword compatibility ensures the existing
   tools that parse zonecfg info / export output are unaffected. It's true
   to some extend. But some of them treat "unknown" resources as nop and
   display them as is. Which means some tools will show the new resources
   as unknown which is harmless but could be confusing sometimes.


We are continuing to add new resources.  'limitpriv' and 'bootargs' are
two recent ones.  We won't stop adding new resources; that would be
too constraining, but we will continue to try to make sure we don't
break scripts that depend on resources that they know about.

   May be having a command line option such as zonecfg info -legacy that 
would

   suppress the new resources could help.


I think it would be better to plan for new resources.  Otherwise, what would
legacy be?  Just the resources that were in the original S10 release?


We will enhance zoneadmd so that if the zone.cpu-shares rctl is set
and FSS is not already the default scheduling class, zoneadmd will set
the scheduling class to be FSS for processes in the zone.

   On the fly ? i.e. If a zone didn't have zone.cpu-shares when it was 
booted
   and someone did prctl to set it, zoneadmd will change the sched class 
of all

   processes in the zone to FSS ? That's cool.


That is not what we plan on doing.  What we are planning is to set
FSS when the zone boots, if it has cpu-shares.  These enhancements are
really targeted at people who don't know a lot about the existing RM
features.  If you know enough to use prctl to do this kind of thing, then
we will expect you to be able to fully manage your system using all of
the existing features.

Thanks again for your comments,
Jerry
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] improved zones/RM integration

2006-06-28 Thread Jerry Jelinek

Mike,

I think most of your comments were addressed in my response to
Jeff but I did want to make sure one thing was clear.

Mike Gerdts wrote:

On 6/27/06, Jeff Victor <[EMAIL PROTECTED]> wrote:
4) Aliases: The notion of aliases also creates redundant output which 
could be
confusing.  I like the simplification of aliases as described, but I 
wish I had a
good solution that wouldn't break existing tools that parse "zonecfg 
info" output

- if such tools exist.


Because key features are missing from zones, I have been writing
scripts that sometimes parse the output of "zonecfg info".  Changing
this format stands a good chance of breaking my scripts.  It sounds
like if I set pool and rctl resources, they would still be displayed
as rctl and not translated to the syntax associated with temporary
pools.  Is this correct?


Yes, we considered lots of alternatives but we wanted to make sure
that any scripts would continue to work.  So, if your scripts are setting
or looking at the rctl entries, then they should continue to work,
even if you also start to use the new resources.

Thanks,
Jerry
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] improved zones/RM integration

2006-06-28 Thread Jerry Jelinek

Jeff,

Thanks for your comments.  I have a few responses in-line.

Jeff Victor wrote:
1) General comment: I agree that this will provide needed clarity to the 
seemingly unorganized RM features that we have scattered through Solaris 
during the last decade. The automation of certain activities (e.g. 
starting rcapd in the GZ when needed) will also be extremely beneficial.


2) Terminology: Although your use of the phrase "hard partition" should 
be clear to most people, through experience with other partitioning 
technologies, the use of "soft partition" is less clear. To many people, 
a "soft" limit is one that can be exceeded occasionally or in certain 
situations, or is merely an advisory limit. It also conflicts with SVM 
"soft partitions."


We started using "hard" and "soft" to describe the general idea amongst
ourselves when we first started talking about this but we were never
very satisfied with those terms either.  These two terms will not necessarily
be used in the final documentation and are not used in the resource names
themselves.  Naming always seems to be a difficult area.  The key technical
issue to focus on is the resource names and properties being proposed as
opposed to the overall words we are using to describe the general ideas.
I am guessing we were able to communicate the general idea using "hard" and
"soft" so I think we are ok there.  We will have to figure out the best way
to document this when the time comes.  It is hard to find good terms that
are not already used by other parts of the system.  The word "resource" is
a good example and is probably more confusing than "soft partition".

3) Lwps context: Why is the lwps alias defined in the context of 
dedicated-cpu? Lwps seem to be unrelated to hard partitions.  Further, 
ncpus represents a subset of a finite resource. Lwps are not finite.  
The two should be separate.


Being able to set max-lwps is a useful limit for both processor sets
and cpu-caps which is why it is available in both resources.  The global
zone still manages all processes, even when you are using a processor
set, so a fork bomb can still effect the responsiveness of the system
as a whole.  However, max-lwps is optional in both the dedicated-cpu and
capped-cpu resources.  I should have made that clearer.  I will
update the document to clarify that.

4) Aliases: The notion of aliases also creates redundant output which 
could be confusing.  I like the simplification of aliases as described, 
but I wish I had a good solution that wouldn't break existing tools that 
parse "zonecfg info" output - if such tools exist.


This is part of the proposal which we struggled with a lot.  In the
end, we decided that we needed to maintain compatibility so that
we did not break scripts that talked to the CLI directly.  This was the
compromise we came up with that allows us to do that.

5) Another RM setting: While we're integrating RM settings, I think we 
should consider adding project.max-address-space using the same 
semantics as {project,zone}.max-lwps.  A zone-specific setting, 
zone.max-address-space, could be added along with a zonecfg alias, 
max-address-space. This would allow the GZ admin to cap the virtual 
memory space available to that zone.  This would not take the place of 
swap sets, but would be valuable if this RM integration work might be 
complete before swap sets.


We are looking at additional rctls, they are just not part of this project.

Thanks again for your comments,
Jerry
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] improved zones/RM integration

2006-06-28 Thread Amol A Chiplunkar


These are very exciting features !!

Some comments.

If a dedicated-cpu (or eventually a dedicated-memory) resource is
configured for the zone, then when the zone boots zoneadmd will create
a temporary pool dedicated for the zones use.

   The temp pool created is going to show up in pooladm output, right ?
   If someone uses such a pool in a zone configuration done in traditional
   style. ( set pool=SUNWzone{zoneid} ) is zonecfg going to reject it ?
   If not, it won't be "dedicated" anymore.

   Also, is there something that's going to stop poolbind that may move
   zones to and from temp pools to permanent pools ?
   What if a zone was created as a result of which a temp pool was created,
   and poolbind moves it to a permanent pool ? The temp pool is deleted ?

Resource controls in zonecfg will be simplified.
   Do you think we need prctl enhancements that will allow setting rctls
   using the new aliases directly ? that would be good for consistency.

   Also you mention that the backword compatibility ensures the existing
   tools that parse zonecfg info / export output are unaffected. It's true
   to some extend. But some of them treat "unknown" resources as nop and
   display them as is. Which means some tools will show the new resources
   as unknown which is harmless but could be confusing sometimes.

   May be having a command line option such as zonecfg info -legacy that would
   suppress the new resources could help.

We will enhance zoneadmd so that if the zone.cpu-shares rctl is set
and FSS is not already the default scheduling class, zoneadmd will set
the scheduling class to be FSS for processes in the zone.

   On the fly ? i.e. If a zone didn't have zone.cpu-shares when it was booted
   and someone did prctl to set it, zoneadmd will change the sched class of all
   processes in the zone to FSS ? That's cool.

   On the other hand,
   How about when this zone is sharing a pool with another zone without
   cpu-shares and the pool's sched class is also not FSS ?
   We don't recommend processes running in different scheduling class
   share CPUs right ? This feature may take it to that situation.

thanks
- Amol


Gerald A. Jelinek wrote:

Attached is a description of a project we have been refining for
a while now.  The idea is to improve the integration of zones
with some of the existing resource management features in Solaris.
I would appreciate hearing any suggestions or questions.  I'd
like to submit this proposal to our internal architectural review
process by mid-July.  I have also posted a few slides that give an 
overview of the project.  Those are available on the zones files

page (http://www.opensolaris.org/os/community/zones/files/).

Thanks,
Jerry
 
 
This message posted from opensolaris.org





SUMMARY: 


This project enhances Solaris zones[1], pools[2-4] and resource
caps[5,6] to improve the integration of zones with resource
management (RM).  It addresses existing RFEs[7-10] in this area and
lays the groundwork for simplified, coherent management of the various
RM features exposed through zones.

We will integrate some basic pool configuration with zones, implement
the concept of "temporary pools" that are dynamically created/destroyed
when a zone boots/halts and we will simplify the setting of resource
controls within zonecfg.  We will enhance rcapd so that it can cap
a zone's memory while rcapd is running in the global zone.  We will
also make a few other changes to provide a better overall experience
when using zones with RM.

Patch binding is requested for these new interfaces and the stability
of most of these interfaces is "evolving" (see interface table for
complete list).

PROBLEM:

Although zones are fairly easy to configure and install, it appears
that many customers have difficulty setting up a good RM configuration
to accompany their zone configuration.  Understanding RM involves many
new terms and concepts along with lots of documentation to understand.
This leads to the problem that many customers either do not configure
RM with their zones, or configure it incorrectly, leading them to be
disappointed when zones, by themselves, do not provide all of the
containment that they expect.

This problem will just get worse in the near future with the
additional RM features that are coming, such as cpu-caps[11], memory
sets[12] and swap sets[13].

PROPOSAL:

There are 7 different enhancements outlined below.

1) "Hard" vs. "Soft" RM configuration within zonecfg

We will enhance zonecfg(1M) so that the user can configure basic RM
capabilities in a structured way.

The various existing and upcoming RM features can be broken down
into "hard" vs. "soft" pa

Re: [zones-discuss] improved zones/RM integration

2006-06-28 Thread Mike Gerdts

On 6/27/06, Jeff Victor <[EMAIL PROTECTED]> wrote:

1) General comment: I agree that this will provide needed clarity to the 
seemingly
unorganized RM features that we have scattered through Solaris during the last
decade. The automation of certain activities (e.g. starting rcapd in the GZ when
needed) will also be extremely beneficial.


Most definitely.


2) Terminology: Although your use of the phrase "hard partition" should be clear
to most people, through experience with other partitioning technologies, the use
of "soft partition" is less clear. To many people, a "soft" limit is one that 
can
be exceeded occasionally or in certain situations, or is merely an advisory 
limit.
It also conflicts with SVM "soft partitions."

I suggest the phrases "dedicated partition" (or "private partition") and "shared
partition" instead to clarify the intent.  Choosing "dedicated partition" might
then require re-naming "dedicated-cpu" to "guaranteed-cpu" or "private-cpu" and
re-naming "dedicated-memory" to "guaranteed-memory" or "private memory".

Or we could leave "hard partition" alone and simply change "soft partition" to
"shared partition."


I think that "soft partition" is more clear.  "Shared partition"
suggest to me that you are sharing a single hard or soft partition for
multiple workloads.


3) Lwps context: Why is the lwps alias defined in the context of dedicated-cpu?
Lwps seem to be unrelated to hard partitions.  Further, ncpus represents a 
subset
of a finite resource. Lwps are not finite.  The two should be separate.


Something seemed odd with that to me too.  I didn't see it as terribly
harmful there, but it is an excellent point.  This is analagous to
maxuprc typically set in /etc/system.  What are the chances that each
zone in the future gets file descriptor limits or other similar limits
that should be in the same section as the lwp limit?


4) Aliases: The notion of aliases also creates redundant output which could be
confusing.  I like the simplification of aliases as described, but I wish I had 
a
good solution that wouldn't break existing tools that parse "zonecfg info" 
output
- if such tools exist.


Because key features are missing from zones, I have been writing
scripts that sometimes parse the output of "zonecfg info".  Changing
this format stands a good chance of breaking my scripts.  It sounds
like if I set pool and rctl resources, they would still be displayed
as rctl and not translated to the syntax associated with temporary
pools.  Is this correct?

Mike

--
Mike Gerdts
http://mgerdts.blogspot.com/
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] improved zones/RM integration

2006-06-27 Thread Jeff Victor
1) General comment: I agree that this will provide needed clarity to the seemingly 
unorganized RM features that we have scattered through Solaris during the last 
decade. The automation of certain activities (e.g. starting rcapd in the GZ when 
needed) will also be extremely beneficial.


2) Terminology: Although your use of the phrase "hard partition" should be clear 
to most people, through experience with other partitioning technologies, the use 
of "soft partition" is less clear. To many people, a "soft" limit is one that can 
be exceeded occasionally or in certain situations, or is merely an advisory limit. 
It also conflicts with SVM "soft partitions."


I suggest the phrases "dedicated partition" (or "private partition") and "shared 
partition" instead to clarify the intent.  Choosing "dedicated partition" might 
then require re-naming "dedicated-cpu" to "guaranteed-cpu" or "private-cpu" and 
re-naming "dedicated-memory" to "guaranteed-memory" or "private memory".


Or we could leave "hard partition" alone and simply change "soft partition" to 
"shared partition."


3) Lwps context: Why is the lwps alias defined in the context of dedicated-cpu? 
Lwps seem to be unrelated to hard partitions.  Further, ncpus represents a subset 
of a finite resource. Lwps are not finite.  The two should be separate.


4) Aliases: The notion of aliases also creates redundant output which could be 
confusing.  I like the simplification of aliases as described, but I wish I had a 
good solution that wouldn't break existing tools that parse "zonecfg info" output 
- if such tools exist.


5) Another RM setting: While we're integrating RM settings, I think we should 
consider adding project.max-address-space using the same semantics as 
{project,zone}.max-lwps.  A zone-specific setting, zone.max-address-space, could 
be added along with a zonecfg alias, max-address-space. This would allow the GZ 
admin to cap the virtual memory space available to that zone.  This would not take 
the place of swap sets, but would be valuable if this RM integration work might be 
complete before swap sets.


6) From your "Project Alternatives" slide:
Should we have yet another standalone GUI?  Absolutely not.  Unless it was a 
wizard - IMO Solaris adoption would accelerate greatly if it had wizards to aid 
potential adopters.


However this ends up looking, I look forward to seeing this integration!


Gerald A. Jelinek wrote:

Attached is a description of a project we have been refining for
a while now.  The idea is to improve the integration of zones
with some of the existing resource management features in Solaris.
I would appreciate hearing any suggestions or questions.  I'd
like to submit this proposal to our internal architectural review
process by mid-July.  I have also posted a few slides that give an 
overview of the project.  Those are available on the zones files

page (http://www.opensolaris.org/os/community/zones/files/).

Thanks,
Jerry
 
 
This message posted from opensolaris.org





SUMMARY: 


This project enhances Solaris zones[1], pools[2-4] and resource
caps[5,6] to improve the integration of zones with resource
management (RM).  It addresses existing RFEs[7-10] in this area and
lays the groundwork for simplified, coherent management of the various
RM features exposed through zones.

We will integrate some basic pool configuration with zones, implement
the concept of "temporary pools" that are dynamically created/destroyed
when a zone boots/halts and we will simplify the setting of resource
controls within zonecfg.  We will enhance rcapd so that it can cap
a zone's memory while rcapd is running in the global zone.  We will
also make a few other changes to provide a better overall experience
when using zones with RM.

Patch binding is requested for these new interfaces and the stability
of most of these interfaces is "evolving" (see interface table for
complete list).

PROBLEM:

Although zones are fairly easy to configure and install, it appears
that many customers have difficulty setting up a good RM configuration
to accompany their zone configuration.  Understanding RM involves many
new terms and concepts along with lots of documentation to understand.
This leads to the problem that many customers either do not configure
RM with their zones, or configure it incorrectly, leading them to be
disappointed when zones, by themselves, do not provide all of the
containment that they expect.

This problem will just get worse in the near future with the
additional RM features that are coming, such as cpu-caps[11], memory
sets[12] and swap sets[13].

PROPOSAL:

There are 7 different enhancements outlined below.

1) "Hard" vs. "Soft" RM configuration 

[zones-discuss] improved zones/RM integration

2006-06-26 Thread Gerald A. Jelinek
Attached is a description of a project we have been refining for
a while now.  The idea is to improve the integration of zones
with some of the existing resource management features in Solaris.
I would appreciate hearing any suggestions or questions.  I'd
like to submit this proposal to our internal architectural review
process by mid-July.  I have also posted a few slides that give an 
overview of the project.  Those are available on the zones files
page (http://www.opensolaris.org/os/community/zones/files/).

Thanks,
Jerry
 
 
This message posted from opensolaris.orgSUMMARY: 

This project enhances Solaris zones[1], pools[2-4] and resource
caps[5,6] to improve the integration of zones with resource
management (RM).  It addresses existing RFEs[7-10] in this area and
lays the groundwork for simplified, coherent management of the various
RM features exposed through zones.

We will integrate some basic pool configuration with zones, implement
the concept of "temporary pools" that are dynamically created/destroyed
when a zone boots/halts and we will simplify the setting of resource
controls within zonecfg.  We will enhance rcapd so that it can cap
a zone's memory while rcapd is running in the global zone.  We will
also make a few other changes to provide a better overall experience
when using zones with RM.

Patch binding is requested for these new interfaces and the stability
of most of these interfaces is "evolving" (see interface table for
complete list).

PROBLEM:

Although zones are fairly easy to configure and install, it appears
that many customers have difficulty setting up a good RM configuration
to accompany their zone configuration.  Understanding RM involves many
new terms and concepts along with lots of documentation to understand.
This leads to the problem that many customers either do not configure
RM with their zones, or configure it incorrectly, leading them to be
disappointed when zones, by themselves, do not provide all of the
containment that they expect.

This problem will just get worse in the near future with the
additional RM features that are coming, such as cpu-caps[11], memory
sets[12] and swap sets[13].

PROPOSAL:

There are 7 different enhancements outlined below.

1) "Hard" vs. "Soft" RM configuration within zonecfg

We will enhance zonecfg(1M) so that the user can configure basic RM
capabilities in a structured way.

The various existing and upcoming RM features can be broken down
into "hard" vs. "soft" partitioning of the system's resources.
With "hard" partitioning, resources are dedicated to the zone using
processor sets (psets) and memory sets (msets).  With "soft"
partitioning, resources are shared, but capped, with an upper limit
on their use by the zone.

 Hard|Soft
   -
   cpu|  psets   |  cpu-caps
   memory |  msets   |  rcapd

There are also some existing rctls (zone.cpu-shares, zone.max-lwps)
which will be integrated into this overall concept.

Within zonecfg we will organize the various RM features into four
basic zonecfg resources so that it is simple for a user to understand
and configure the RM features that are to be used with their zone.
Note that zonecfg "resources" are not the same as "resource
management".  Within zonecfg, a "resource" is the name of a top-level
property of the zone (see zonecfg(1M) for more information).

The four new zonecfg resources are:
dedicated-cpu
capped-cpu (future, once cpu-caps are integrated)
dedicated-memory (future, once memory sets are integrated)
capped-memory

Each of these zonecfg resources will have properties that are
appropriate to the RM capabilities associated with that resource.
Zonecfg will only allow one instance of each these resource to be
configured and it will not allow conflicting resources to be added
(e.g. dedicated-cpu and capped-cpu are mutually exclusive).

The mapping of these new zonecfg resources to the primary underlying RM
feature is:
dedicated-cpu -> temporary pset
dedicated-memory -> temporary mset
capped-cpu -> cpu-cap rctl [11]
capped-memory -> rcapd running in GZ

Temporary psets and msets are described below, in section 2.
Rcapd enhancements for running in the global zone are described below,
in section 4.

The valid properties for each of these new zonecfg resources will be:

dedicated-cpu
ncpus (