Dan,

Thanks for your detailed comments.  My responses are in-line.

Dan Price wrote:
Very belatedly, I'm just getting around to reviewing this.  Overall
I think it looks good.  Comments in-line.

1) "Hard" vs. "Soft" RM configuration within zonecfg

...
                dedicated-cpu
                        ncpus (a positive integer or range, default value 1)
                        importance (a positive integer, default value 1)
                        max-lwps (an integer >= 100)

why >= 100?  I can envision a minimized zone where this is too many.

I picked 100 since I had a hard time getting a zone to boot with much less.
Obviously this will vary somewhat depending on the services enabled.
Is 100 really a problem as a lower limit?  Part of what we are trying to
do here is help the user configure a reasonable RM configuration, especially
if they don't know a lot about RM.  Allowing them to set a limit which
prevents the zone from booting seems bad.  However, we could also just let
them do that if 100 seems too high for some reason.  Unfortunately, it is
hard to know in advance what exact number of threads will be needed to
boot the zone.

                capped-cpu
                        cpu-cap (a positive integer, default value 100 which
                                 represents 100% of one cpu)

I'm scared of this default.  To put it another way, why did you pick
100?  Should there be a value which represents infinity?  What is
the meaning of specifying 0, or is that an error?

We don't have to have a default here I guess.  I picked 100 because it seemed
to be symmetrical with the dedicated cpu case where the lower number is 1.
As far as the other values (infinity and 0) that should be covered by the
cpu-caps ARC case.  I am not sure if Andrei has finished that case yet.

                        max-lwps (an integer >= 100)
                        cpu-shares (a positive integer)
                dedicated-memory
                        TBD - once msets [12] are completed
                capped-memory
                        cap (a positive decimal number with optional k, m, g,
                             or t as a modifier, no modifier defaults to units
                             of megabytes(m), must be at least 1m)

I think this set of rules is too complex and too confusing for users--
it's weird to have the default units be larger than the smallest
available units.  Let's mandate that the user *always* specify units.

OK.

2) Temporary Pools.

...
        If a dedicated-cpu (or eventually a dedicated-memory) resource is
        configured for the zone, then when the zone boots zoneadmd will create
        a temporary pool dedicated for the zones use.  Zoneadmd will
        dynamically create a pool & pset (or eventually a mset) and assign the
        number of cpus specified in zonecfg to that pset.  The temporary pool
        & pset will be named 'SUNWzone{zoneid}'.

Could we somehow work the zone name into this?  It would be nice for
e.g. poolstat(1) observability.  Otherwise the user experience is going
to be all about trying to work out what 'SUNWzone34' maps to, which
seems poor.

We need to have the name begin with SUNW or we could have collisions with
existing pools.  I supposed instead of zone{id}, it could be SUNW{zonename}
although you lose the visibility that the pool is associated with a zone.
Maybe SUNWzone_{zonename}?

        Zoneadmd will set the 'pset.min' and 'pset.max' pset properties, as
        well as the 'pool.importance' pool property, based on the values
        specified for dedicated-cpu's 'ncpus' and 'importance' properties
        in zonecfg.

Is importance mandatory?  Will it have a default value?  What values can
it have?  What does it mean?  Please be a little more specific.

Yes, 1.  I will add more details referring to the pools documentation on
importance.

        If the cpu (or memory) resources needed to create the temporary pool
        are unavailable, zoneadmd will issue an error and the zone won't boot.

        When the zone is halted, the temporary pool & pset will be destroyed.

What about during a reboot.  It seems like it'd be good to not tear
down the temporary pool during reboot, but maybe that's hard.  It would
to me seem weird if my pool was 2-4 CPUs, and I had 2, then rebooted and
had 4.

We won't destroy the pool on reboot, it is preserved.  Although it is not
a big deal right now, it will become more of an issue when we have memory
sets, so I made sure the pool is preserved across reboot.  I'll clarify that.

        We will add a new boolean property ('temporary') that can exist on
        pools and any resource set.  The 'temporary' property indicates that
        the pool or resource set should never be committed to a static
        configuration (e.g. pooladm -s) and that it should never be destroyed
        when updating the dynamic configuration from a static configuration
        (e.g. pooladm -c).  These temporary pools/resources can only be managed
        in the dynamic configuration.  These changes will be implemented within
        libpool(3LIB).

        It is our expectation that most users will never need to manage
        temporary pools through the existing poolcfg(1M) commands.  For users
        who need more sophisticated pool configuration and management, the
        existing 'pool' resource within zonecfg should be used and users
        should manually create a permanent pool using the existing mechanisms.

3) Resource controls in zonecfg will be simplified.
...
        Here are the aliases we will define for the rctls:
                alias           rctl
                -----           ----
                max-lwps        zone.max-lwps
                cpu-shares      zone.cpu-shares
                cpu-cap         zone.cpu-cap (future, once cpu-caps integrate)

You've mentioned that you will substitute in sort of "the right"
defaults for the privileged and action fields.  It seems like you should
spell out what those will be...

I'll add that.

    alias         rctl
    --------------------------------------------------------------
    cpu-shares=X  zone.cpu-shares(privileged, X, none)
     ...

        If an rctl was already defined that did not match the expected value
        (e.g. it had 'action=none' or multiple values), then the 'max-lwps'
        alias will be disabled.  An attempt to set 'max-lwps' within
        'dedicated-cpu' would print the following error:
                "One or more incompatible rctls already exist for this
                 property"

        This rctl alias enhancement is fully backward compatible with the
        existing rctl syntax.  That is, zonecfg output will continue to display
        rctl settings in the current format (in addition to the new aliased
        format) and zonecfg will continue to accept the existing input syntax
        for setting rctls.  This ensures full backward compatibility for any
        existing tools/scripts that parse zonecfg output or configure zones.

Maybe I missed it-- but what is the behavior of 'zonecfg export' going to be?

No different than it is now.  That is, we still export with the traditional
rctl syntax.  I'll clarify that.

4) Enable rcapd to limit zone memory while running in the global zone

        Currently, to use rcapd(1M) to limit zone memory consumption, the
        rcapd process must be run within the zone.  This exposes a loophole
        since the zone administrator, who might be untrusted, can change the
        rcapd limit.

Suggest rewording: "While useful in some configurations, in situations
where the zone administrator is untrusted, this is inneffective, since
the zone administrator could simply change the rcapd limit."

I'll add that.

        We will enhance rcapd so that it can limit zone's memory consumption
        while it is running in the global zone.  This closes the rcapd
        loophole and allows the global zone administrator to set memory
        caps that can be enforced by a single, trusted process.

Ditto on the rewording (basically, I think "loophole" is too vague).

OK.

        The rcapd limit for a zone will be configured using the new

Here you say "a zone"-- can you be precise?  Does that include the
global zone?

I'll clarify.  It is not currently for the GZ, but that could be a future
enhancement (i.e. make zonecfg manage some of the GZ too).

        'capped-memory' resource and 'cap' property within zonecfg.
        When a zone with 'capped-memory' boots, zoneadmd will automatically
        start rcapd in the global zone, if necessary.  The interfaces to

Would it be better to say "enable the rcap service"?

I'll change that.

        communicate memory cap information between zoneadmd and rcapd
        are project private.

At an architectural level, it'd be nice to summarize them; for example,
does one need to reboot the zone to get the new setting?  Is there
any way to do online tuning of the value?  Should this just be done
with SMF properties?

I'll clarify this.

        As part of this overall project, we will be enhancing the internal
        rcapd rss accounting so that rcapd will have a more accurate
        measurement of the overall rss for each zone.

More detail would be appreciated.

OK.

5) Use FSS when zone.cpu-shares is set
        Although the zone.cpu-shares rctl can be set on a zone, the Fair Share
        Scheduler (FSS) is not the default scheduling class so this rctl
        frequently has no effect, unless the user also sets FSS as the
        default scheduler or changes the zones processes to use FSS with the
        priocntl(1M) command.  This means that users can easily think
        they have configured their zone for a behavior that they are not
        actually getting.

        We will enhance zoneadmd so that if the zone.cpu-shares rctl is set
        and FSS is not already the default scheduling class, zoneadmd will set
        the scheduling class to be FSS for processes in the zone.

Just for that zone?  This to me still seems confusing to users-- you
could have 3 zone with FSS on, and two without. How about *also* issuing a
warning at zone boot if FSS is not the machine-wide default.

Yes, just for that zone.  We will print the warning too.  That did not
seem architectural, but I'll add that just to be clear.

Apropos my earlier (today) comment about dispadmin, should we have
some sort of 'dispadmin -d -do-it-now' option?

We could propose that or we could just enhance the docs to describe the
procedure.  Why don't we chat about this later today.

7) Pools system objective defaults to weighted-load (wt-load)[4]

        Currently pools are delivered with no objective set.  This means that
        if you enable the poold(1M) service, nothing will actually happen on
        your system.

        As part of this project, we will set weighted load
        (system.poold.objectives=wt-load) to be the default objective.
        Delivering this objective as the default does not impact systems out
        of the box since poold is disabled by default.

What happens if you boot a zone which uses temporary pools, but pools
are not enabled?  Should booting zones enable poold?

Yes, pools will be enabled.  I'll clarify that since it is one of the important
points about the whole proposal.  That is, the right things will just happen
when you are using temp. pools so that you don't have to know and run all of
the extra RM commands.

Thanks again for all of your comments.  I'll roll them into the proposal
along with the other comments I received.

Jerry
_______________________________________________
zones-discuss mailing list
zones-discuss@opensolaris.org

Reply via email to