Dan,
Thanks for your detailed comments. My responses are in-line.
Dan Price wrote:
Very belatedly, I'm just getting around to reviewing this. Overall
I think it looks good. Comments in-line.
1) "Hard" vs. "Soft" RM configuration within zonecfg
...
dedicated-cpu
ncpus (a positive integer or range, default value 1)
importance (a positive integer, default value 1)
max-lwps (an integer >= 100)
why >= 100? I can envision a minimized zone where this is too many.
I picked 100 since I had a hard time getting a zone to boot with much less.
Obviously this will vary somewhat depending on the services enabled.
Is 100 really a problem as a lower limit? Part of what we are trying to
do here is help the user configure a reasonable RM configuration, especially
if they don't know a lot about RM. Allowing them to set a limit which
prevents the zone from booting seems bad. However, we could also just let
them do that if 100 seems too high for some reason. Unfortunately, it is
hard to know in advance what exact number of threads will be needed to
boot the zone.
capped-cpu
cpu-cap (a positive integer, default value 100 which
represents 100% of one cpu)
I'm scared of this default. To put it another way, why did you pick
100? Should there be a value which represents infinity? What is
the meaning of specifying 0, or is that an error?
We don't have to have a default here I guess. I picked 100 because it seemed
to be symmetrical with the dedicated cpu case where the lower number is 1.
As far as the other values (infinity and 0) that should be covered by the
cpu-caps ARC case. I am not sure if Andrei has finished that case yet.
max-lwps (an integer >= 100)
cpu-shares (a positive integer)
dedicated-memory
TBD - once msets [12] are completed
capped-memory
cap (a positive decimal number with optional k, m, g,
or t as a modifier, no modifier defaults to units
of megabytes(m), must be at least 1m)
I think this set of rules is too complex and too confusing for users--
it's weird to have the default units be larger than the smallest
available units. Let's mandate that the user *always* specify units.
OK.
2) Temporary Pools.
...
If a dedicated-cpu (or eventually a dedicated-memory) resource is
configured for the zone, then when the zone boots zoneadmd will create
a temporary pool dedicated for the zones use. Zoneadmd will
dynamically create a pool & pset (or eventually a mset) and assign the
number of cpus specified in zonecfg to that pset. The temporary pool
& pset will be named 'SUNWzone{zoneid}'.
Could we somehow work the zone name into this? It would be nice for
e.g. poolstat(1) observability. Otherwise the user experience is going
to be all about trying to work out what 'SUNWzone34' maps to, which
seems poor.
We need to have the name begin with SUNW or we could have collisions with
existing pools. I supposed instead of zone{id}, it could be SUNW{zonename}
although you lose the visibility that the pool is associated with a zone.
Maybe SUNWzone_{zonename}?
Zoneadmd will set the 'pset.min' and 'pset.max' pset properties, as
well as the 'pool.importance' pool property, based on the values
specified for dedicated-cpu's 'ncpus' and 'importance' properties
in zonecfg.
Is importance mandatory? Will it have a default value? What values can
it have? What does it mean? Please be a little more specific.
Yes, 1. I will add more details referring to the pools documentation on
importance.
If the cpu (or memory) resources needed to create the temporary pool
are unavailable, zoneadmd will issue an error and the zone won't boot.
When the zone is halted, the temporary pool & pset will be destroyed.
What about during a reboot. It seems like it'd be good to not tear
down the temporary pool during reboot, but maybe that's hard. It would
to me seem weird if my pool was 2-4 CPUs, and I had 2, then rebooted and
had 4.
We won't destroy the pool on reboot, it is preserved. Although it is not
a big deal right now, it will become more of an issue when we have memory
sets, so I made sure the pool is preserved across reboot. I'll clarify that.
We will add a new boolean property ('temporary') that can exist on
pools and any resource set. The 'temporary' property indicates that
the pool or resource set should never be committed to a static
configuration (e.g. pooladm -s) and that it should never be destroyed
when updating the dynamic configuration from a static configuration
(e.g. pooladm -c). These temporary pools/resources can only be managed
in the dynamic configuration. These changes will be implemented within
libpool(3LIB).
It is our expectation that most users will never need to manage
temporary pools through the existing poolcfg(1M) commands. For users
who need more sophisticated pool configuration and management, the
existing 'pool' resource within zonecfg should be used and users
should manually create a permanent pool using the existing mechanisms.
3) Resource controls in zonecfg will be simplified.
...
Here are the aliases we will define for the rctls:
alias rctl
----- ----
max-lwps zone.max-lwps
cpu-shares zone.cpu-shares
cpu-cap zone.cpu-cap (future, once cpu-caps integrate)
You've mentioned that you will substitute in sort of "the right"
defaults for the privileged and action fields. It seems like you should
spell out what those will be...
I'll add that.
alias rctl
--------------------------------------------------------------
cpu-shares=X zone.cpu-shares(privileged, X, none)
...
If an rctl was already defined that did not match the expected value
(e.g. it had 'action=none' or multiple values), then the 'max-lwps'
alias will be disabled. An attempt to set 'max-lwps' within
'dedicated-cpu' would print the following error:
"One or more incompatible rctls already exist for this
property"
This rctl alias enhancement is fully backward compatible with the
existing rctl syntax. That is, zonecfg output will continue to display
rctl settings in the current format (in addition to the new aliased
format) and zonecfg will continue to accept the existing input syntax
for setting rctls. This ensures full backward compatibility for any
existing tools/scripts that parse zonecfg output or configure zones.
Maybe I missed it-- but what is the behavior of 'zonecfg export' going to be?
No different than it is now. That is, we still export with the traditional
rctl syntax. I'll clarify that.
4) Enable rcapd to limit zone memory while running in the global zone
Currently, to use rcapd(1M) to limit zone memory consumption, the
rcapd process must be run within the zone. This exposes a loophole
since the zone administrator, who might be untrusted, can change the
rcapd limit.
Suggest rewording: "While useful in some configurations, in situations
where the zone administrator is untrusted, this is inneffective, since
the zone administrator could simply change the rcapd limit."
I'll add that.
We will enhance rcapd so that it can limit zone's memory consumption
while it is running in the global zone. This closes the rcapd
loophole and allows the global zone administrator to set memory
caps that can be enforced by a single, trusted process.
Ditto on the rewording (basically, I think "loophole" is too vague).
OK.
The rcapd limit for a zone will be configured using the new
Here you say "a zone"-- can you be precise? Does that include the
global zone?
I'll clarify. It is not currently for the GZ, but that could be a future
enhancement (i.e. make zonecfg manage some of the GZ too).
'capped-memory' resource and 'cap' property within zonecfg.
When a zone with 'capped-memory' boots, zoneadmd will automatically
start rcapd in the global zone, if necessary. The interfaces to
Would it be better to say "enable the rcap service"?
I'll change that.
communicate memory cap information between zoneadmd and rcapd
are project private.
At an architectural level, it'd be nice to summarize them; for example,
does one need to reboot the zone to get the new setting? Is there
any way to do online tuning of the value? Should this just be done
with SMF properties?
I'll clarify this.
As part of this overall project, we will be enhancing the internal
rcapd rss accounting so that rcapd will have a more accurate
measurement of the overall rss for each zone.
More detail would be appreciated.
OK.
5) Use FSS when zone.cpu-shares is set
Although the zone.cpu-shares rctl can be set on a zone, the Fair Share
Scheduler (FSS) is not the default scheduling class so this rctl
frequently has no effect, unless the user also sets FSS as the
default scheduler or changes the zones processes to use FSS with the
priocntl(1M) command. This means that users can easily think
they have configured their zone for a behavior that they are not
actually getting.
We will enhance zoneadmd so that if the zone.cpu-shares rctl is set
and FSS is not already the default scheduling class, zoneadmd will set
the scheduling class to be FSS for processes in the zone.
Just for that zone? This to me still seems confusing to users-- you
could have 3 zone with FSS on, and two without. How about *also* issuing a
warning at zone boot if FSS is not the machine-wide default.
Yes, just for that zone. We will print the warning too. That did not
seem architectural, but I'll add that just to be clear.
Apropos my earlier (today) comment about dispadmin, should we have
some sort of 'dispadmin -d -do-it-now' option?
We could propose that or we could just enhance the docs to describe the
procedure. Why don't we chat about this later today.
7) Pools system objective defaults to weighted-load (wt-load)[4]
Currently pools are delivered with no objective set. This means that
if you enable the poold(1M) service, nothing will actually happen on
your system.
As part of this project, we will set weighted load
(system.poold.objectives=wt-load) to be the default objective.
Delivering this objective as the default does not impact systems out
of the box since poold is disabled by default.
What happens if you boot a zone which uses temporary pools, but pools
are not enabled? Should booting zones enable poold?
Yes, pools will be enabled. I'll clarify that since it is one of the important
points about the whole proposal. That is, the right things will just happen
when you are using temp. pools so that you don't have to know and run all of
the extra RM commands.
Thanks again for all of your comments. I'll roll them into the proposal
along with the other comments I received.
Jerry
_______________________________________________
zones-discuss mailing list
zones-discuss@opensolaris.org