Steffen,

Thanks for your comments.  Responses in-line.

Steffen Weiberle wrote:
Hi Jerry, this is great.

I have a few comments below.

Thanks
Steffen

1) "Hard" vs. "Soft" RM configuration within zonecfg

    We will enhance zonecfg(1M) so that the user can configure basic RM
    capabilities in a structured way.

    Various existing and upcoming RM features can be broken down
    into "hard" vs. "soft" partitioning of the system's resources.
    With "hard" partitioning, resources are dedicated to the zone using
    processor sets (psets) and memory sets (msets).  With "soft"
    partitioning, resources are shared, but capped, with an upper limit
    on their use by the zone.

                         Hard    |    Soft
               ---------------------------------
               cpu    |  psets   |  cpu-caps
               memory |  msets   |  rcapd

    Within zonecfg we will organize these various RM features into four
    basic zonecfg resources so that it is simple for a user to understand
    and configure the RM features that are to be used with their zone.
    Note that zonecfg "resources" are not the same as the system's
    cpu & memory resources or "resource management".  Within zonecfg, a
"resource" is the name of a top-level property group for the zone (see
    zonecfg(1M) for more information).

Are you saying just the names are different, or are there other differences as well?

Unfortunately the word "resource" is overloaded here.  zonecfg(1M) uses
it to mean a group of properties which has nothing to do with
system resources (e.g. cpu or memory) or how the word "resource" is
used under the umbrella of Solaris Resource Management.

    The four new zonecfg resources are:
        dedicated-cpu
        capped-cpu       (future, after cpu-caps are integrated)
        dedicated-memory (future, after memory sets are integrated)
        capped-memory

    Each of these zonecfg resources will have properties that are
    appropriate to the RM capabilities associated with that resource.
    Zonecfg will only allow one instance of each these resource to be
    configured and it will not allow conflicting resources to be added
    (e.g. dedicated-cpu and capped-cpu are mutually exclusive).

The mapping of these new zonecfg resources to the underlying RM feature
    is:
        dedicated-cpu -> temporary pset
        dedicated-memory -> temporary mset
        capped-cpu -> cpu-cap rctl [14]
        capped-memory -> rcapd running in global zone

    Temporary psets and msets are described below, in section 2.
    Rcapd enhancements for running in the global zone are described below
    in section 4.

    The valid properties for each of these new zonecfg resources will be:

        dedicated-cpu
            ncpus
            importance
        capped-cpu
            ncpus
        dedicated-memory
            physical
            virtual
            importance
        capped-memory
            physical
            virtual

    The meaning of each of these properties is as follows:

    dedicated-cpu
        ncpus:    This can be a positive integer or range.  A value of
            '2' means two cpus, a value of '2-4' means a range of
            two to four cpus.  This sets the 'pset.min' and
            'pset.max' properties on the temporary pset.
        importance: This property is optional.  It can be a positive
            integer.  It sets the 'pool.importance' property on
            the temporary pool.

    capped-cpu
        This resource group and its property will not be delivered as
        part of this project since cpu-caps are still under
        development.  However, our thinking on this is described here
        for completeness.

        ncpus:    This is a positive decimal.  The 'ncpus' property
            actually maps to the zone.cpu-cap rctl.  This property
            will be implemented as a special case of the new zones
            rctl aliases which are described below in section 3.
            The special case handling of this property will
            normalize the value so that it corresponds to units of
            cpus and is similar to the 'ncpus' property under the
            dedicated-cpu resource group.  However, it won't accept
            a range and it will accept a decimal number.  For
            example, when using 'ncpus' in the dedicated-cpu
            resource group, a value of 1 means one dedicated cpu.
            When using 'ncpus' in the capped-cpu resource group,
            a value of 1 means 100% of a cpu is the cap setting.  A
            value of 1.25 means 125%, since 100% corresponds to one
            full cpu on the system when using cpu caps.  The idea
            here is to align the 'ncpus' units as closely as
            possible in these two cases (dedicated-cpu vs.
            capped-cpu), given the limitations and capabilities of
            the two underlying mechanisms (pset vs. rctl).  The
            'ncpus' rctl alias is described further in section 3
            below.

Just want to confirm that there are two places to the right of the decimal point, so that the smallest using is 1/100th of a CPU. This was what the original cpu-caps prototypes had. Or is the value rounded or truncated by the underlying implementation?

Yes.  You can specify down to 1% which is the granularity of cpu-caps.
I'll clarify that.


    dedicated-memory
        These properties are tentative at this point since msets are
        still under development.  The properties will be finalized once
        msets [15] and swap sets [16] are completed.  This resource
        group and its properties will not be delivered as part of this
        project.  However, our thinking on this is described here for
        completeness.

        physical: A positive decimal number or a range with a required
            k, m, g, or t modifier.  This will set the 'mset.min'
            and 'mset.max' properties on the temporary mset.
            A value of '10m' means ten megabytes.  A value of
            '.5g-1.5g' means a range of 500 megabytes up to
            1.5 gigabytes.

        virtual: This accepts the same numbers as the 'physical'
            property.  This will set the 'mset.minswap' and
            'mset.maxswap' properties on the temporary mset.

        One or the other of 'physical' and 'virtual' is optional but at
        least one must be specified.

        importance: This property is optional.  It can be a positive
            integer.  It sets the 'pool.importance' property on
            the temporary pool.  The underlying code in zonecfg
            will refer to the same piece of data for importance in
            both the dedicated-cpu and dedicated-memory case.
            Thus, you can have a temporary pool with either a
            temporary pset, a temporary mset or both.  There is
            only one value for the importance of the temporary
            pool.

    capped-memory
        physical: A positive decimal number with a required k, m, g,
            or t modifier.  A value of '10m' means ten megabytes.
            This will be used by rcapd as the max-rss for the
            zone.  The rcapd enhancement for capping zones is
            described below in section 4.

        virtual: This property is tentative at this point and will not
            be delivered as part of this project.  However, our
            thinking on this is described here for completeness.
            In the future we would like to deliver a new rctl
            which would cap the virtual memory consumption of
            the zone.

Zonecfg will be enhanced to check for invalid combinations. This means it will disallow a dedicated-cpu resource and the zone.cpu-shares rctl
    being defined at the same time.  It also means that explicitly
    specifying a pool name via the 'pool' resource, along with either a
    'dedicated-cpu' or 'dedicated-memory' resource is an invalid
    combination.

    These new zonecfg resource names (dedicated-cpu, capped-cpu,
    dedicated-memory & capped-memory) are chosen so as to be reasonably
    clear what the objective is, even though they do not exactly align
    with our existing underlying (and inconsistent) RM naming schemes.

2) Temporary Pools.

    We will implement the concept of "temporary pools" within the pools
    framework.

    To improve the integration of zones and pools we are allowing the
    configuration of some basic pool attributes within zonecfg, as
    described above in section 1.  However, we do not want to extend
zonecfg to completely and directly manage standard pool configurations. That would lead to confusion and inconsistency regarding which tool to use and where configuration data is stored. Temporary pools sidesteps this problem and allows zones to dynamically create a simple pool/pset
    configuration for the basic case where a sysadmin just wants a
specified number of processors dedicated to the zone (and eventually a
    dedicated amount of memory).

    We believe that the ability to simply specify a fixed number of cpus
    (and eventually a mset size) meets the needs of a large percentage of
    zones users who need "hard" partitioning (e.g. to meet licensing
    restrictions).

    If a dedicated-cpu (and/or eventually a dedicated-memory) resource is
configured for the zone, then when the zone boots zoneadmd will enable pools if necessary and create a temporary pool dedicated for the zones use. Zoneadmd will dynamically create a pool & pset (and/or eventually
    a mset) and assign the number of cpus specified in zonecfg to that
    pset.  The temporary pool & pset will be named 'SUNWtmp_{zonename}'.
    Zonecfg validation will disallow an explicit 'pool' property name
    beginning with 'SUNWtmp'.

    Zoneadmd will set the 'pset.min' and 'pset.max' pset properties, as
    well as the 'pool.importance' pool property, based on the values
    specified for dedicated-cpu's 'ncpus' and 'importance' properties
    in zonecfg, as described above in section 1.

    If the cpu (or memory) resources needed to create the temporary pool
are unavailable, zoneadmd will issue an error and the zone won't boot.

    When the zone is halted, the temporary pool & pset will be destroyed.

We will add a new boolean libpool(3LIB) property ('temporary') that can
    exist on pools and any pool resource set.  The 'temporary' property
indicates that the pool or resource set should never be committed to a
    static configuration (e.g. pooladm -s) and that it should never be
    destroyed when updating the dynamic configuration from a static
    configuration (e.g. pooladm -c).  These temporary pools/resources can
    only be managed in the dynamic configuration.  Support for temporary
    pools will be implemented within libpool(3LIB) using the two new
    consolidation private functions listed in the interface table below.

    It is our expectation that most users will never need to manage
    temporary pools through the existing poolcfg(1M) commands.  For users
    who need more sophisticated pool configuration and management, the
    existing 'pool' resource within zonecfg should be used and users
should manually create a permanent pool using the existing mechanisms.

Will the existing pool commands show the results as if they were created using those commands? It would be a useful learning and templating tool to apply the resulting configuration(s) to scripts using the existing commands for the future.

No, that is not part of this proposal.  I am actually not quite sure what
you are asking here.  Maybe we could take that offline?


3) Resource controls in zonecfg will be simplified [8].

    Within zonecfg rctls take a 3-tuple value where only a single
    component is usually of interest (the 'limit').  The other two
    components of the value (the 'priv' and 'action') are not normally
    changed but users can be confused if they don't understand what the
    other components mean or what values should be specified.

    Here is a zonecfg example:
        > add rctl
        rctl> set name=zone.cpu-shares
        rctl> add value (priv=privileged,limit=5,action=none)
        rctl> end

    Within zonecfg we will introduce the idea of rctl aliases.  The alias
    is a simplified name and template for the existing rctls.  Behind the
    scenes we continue to store the data using the existing rctl entries
in the XML file. Thus, the alias always refers to the same underlying
    piece of data as the full rctl.

    The purpose of the rctl alias is to provide a simplified name and
    mechanism to set the rctl 'limit'.  For each rctl/alias pair we will
    "know" the expected values for the 'priv' and 'action' components of
the rctl value. If an rctl is already defined that does not match this "knowledge" (e.g. it has a non-standard 'action' or there are multiple values defined for the rctl), then the user will not be allowed to use
    an alias for that rctl.

This should help a lot!


    Here are the aliases we will define for the rctls:
    alias        rctl            priv        action
    -----        ----            ----        ------
    max-lwps    zone.max-lwps        privileged    deny
    cpu-shares    zone.cpu-shares        privileged    none

    Coming in the near future, once the associated projects
    integrate [14, 17, 18]
    alias        rctl            priv        action
    -----        ----            ----        ------
    cpu-cap        zone.cpu-cap        privileged    deny
    max-locked-memory zone.max-locked-memory privileged    deny
    max-shm-memory    zone.max-shm-memory    privileged    deny
    max-shm-ids    zone.max-shm-ids    privileged    deny
    max-msg-ids    zone.max-msg-ids    privileged    deny
    max-sem-ids    zone.max-sem-ids    privileged    deny

What is the purpose of some of these zone.* controls? Is it to limit what a priviliged user can set the values to for projects, etc. in that zone or does it set the defaults for the zone as well.

These set the upper limits for the zone as a whole.  Thus, the
non-global zone admin cannot exceed these since they are controlled
by the global zone admin.

I can see it being easier to configure different DB zones from the global zone via zonecfg than having to enter, delegate, and educate the zone users how to set them. I'm leaning towards making these the defaults for the zone, not the just limit.

You won't be able to override these in the non-global zone.


    Here is an example of the max-lwps alias usage within zonecfg:

        > set max-lwps=500
        > info
        ...
        [max-lwps: 500]
        ...
        rctl:
            name: zone.max-lwps
            value: (priv=privileged,limit=500,action=deny)

    In the example, you can see the use of the alias when setting the
    value and you can also see the full rctl output within the 'info'
    command.  The alias is "flagged" in the output with brackets as
    a visual indicator that the property corresponds to the full
    rctl definition printed later in the output.

    If you update the rctl value through the 'rctl' resource then the
corresponding value in the aliased property would also be updated since
    both the rctl and its alias refer to the same piece of data.

    If an rctl was already defined that did not match the expected value
(e.g. it had 'action=none' or multiple values), then the alias will be
    disabled.  An attempt to set the limit via the alias would print the
    following error:
        "An incompatible rctl already exists for this property"

    This rctl alias enhancement is fully backward compatible with the
existing rctl syntax. That is, zonecfg output will continue to display
    rctl settings in the current format (in addition to the new aliased
    format) and zonecfg will continue to accept the existing input syntax
    for setting rctls.  This ensures full backward compatibility for any
    existing tools/scripts that parse zonecfg output or configure zones.
    Also, the rctl data will continue to be printed in the output from
    the 'export' subcommand using the existing syntax.

    Future rctls added to zonecfg will also provide aliases following the
pattern described here (e.g. [17, 18]). In section 1 we described the special case 'ncpus' rctl alias as a property under the capped-cpu resource group. This property is really
    just another rctl alias for the zone.cpu-cap rctl, with one
    exception; the limit value is scaled up by 100 so that the value can
    be specified in cpu units and aligned with the 'ncpus' property
    under the dedicated-cpu resource group.  Thus, a value of 2
    will really set the zone.cpu-cap rctl limit to 200, which means the
    cpu cap is 200%.  This alias is being described here but will not
    actually be delivered in the first phase of this project since
    cpu-caps [14] are not yet completed.

I can see this getting confusing. Here it is in integer percentages (essentially) , before it was in full CPUs.

I'll try to clarify this a bit more.

Thanks again for your input,
Jerry
_______________________________________________
zones-discuss mailing list
zones-discuss@opensolaris.org

Reply via email to