Hi Jerry, this is great.

I have a few comments below.


1) "Hard" vs. "Soft" RM configuration within zonecfg

        We will enhance zonecfg(1M) so that the user can configure basic RM
        capabilities in a structured way.

        Various existing and upcoming RM features can be broken down
        into "hard" vs. "soft" partitioning of the system's resources.
        With "hard" partitioning, resources are dedicated to the zone using
        processor sets (psets) and memory sets (msets).  With "soft"
        partitioning, resources are shared, but capped, with an upper limit
        on their use by the zone.

                         Hard    |    Soft
               cpu    |  psets   |  cpu-caps
               memory |  msets   |  rcapd

        Within zonecfg we will organize these various RM features into four
        basic zonecfg resources so that it is simple for a user to understand
        and configure the RM features that are to be used with their zone.
        Note that zonecfg "resources" are not the same as the system's
        cpu & memory resources or "resource management".  Within zonecfg, a
        "resource" is the name of a top-level property group for the zone (see
        zonecfg(1M) for more information).

Are you saying just the names are different, or are there other differences as well?

        The four new zonecfg resources are:
                capped-cpu       (future, after cpu-caps are integrated)
                dedicated-memory (future, after memory sets are integrated)

        Each of these zonecfg resources will have properties that are
        appropriate to the RM capabilities associated with that resource.
        Zonecfg will only allow one instance of each these resource to be
        configured and it will not allow conflicting resources to be added
        (e.g. dedicated-cpu and capped-cpu are mutually exclusive).

        The mapping of these new zonecfg resources to the underlying RM feature
                dedicated-cpu -> temporary pset
                dedicated-memory -> temporary mset
                capped-cpu -> cpu-cap rctl [14]
                capped-memory -> rcapd running in global zone

        Temporary psets and msets are described below, in section 2.
        Rcapd enhancements for running in the global zone are described below
        in section 4.

        The valid properties for each of these new zonecfg resources will be:


        The meaning of each of these properties is as follows:

                ncpus:  This can be a positive integer or range.  A value of
                        '2' means two cpus, a value of '2-4' means a range of
                        two to four cpus.  This sets the 'pset.min' and
'pset.max' properties on the temporary pset.
                importance: This property is optional.  It can be a positive
                        integer.  It sets the 'pool.importance' property on
                        the temporary pool.

                This resource group and its property will not be delivered as
                part of this project since cpu-caps are still under
                development.  However, our thinking on this is described here
                for completeness.

                ncpus:  This is a positive decimal.  The 'ncpus' property
                        actually maps to the zone.cpu-cap rctl.  This property
                        will be implemented as a special case of the new zones
                        rctl aliases which are described below in section 3.
                        The special case handling of this property will
                        normalize the value so that it corresponds to units of
                        cpus and is similar to the 'ncpus' property under the
                        dedicated-cpu resource group.  However, it won't accept
                        a range and it will accept a decimal number.  For
                        example, when using 'ncpus' in the dedicated-cpu
                        resource group, a value of 1 means one dedicated cpu.
                        When using 'ncpus' in the capped-cpu resource group,
                        a value of 1 means 100% of a cpu is the cap setting.  A
                        value of 1.25 means 125%, since 100% corresponds to one
                        full cpu on the system when using cpu caps.  The idea
                        here is to align the 'ncpus' units as closely as
                        possible in these two cases (dedicated-cpu vs.
                        capped-cpu), given the limitations and capabilities of
                        the two underlying mechanisms (pset vs. rctl).  The
                        'ncpus' rctl alias is described further in section 3

Just want to confirm that there are two places to the right of the decimal point, so that the smallest using is 1/100th of a CPU. This was what the original cpu-caps prototypes had. Or is the value rounded or truncated by the underlying implementation?

                These properties are tentative at this point since msets are
                still under development.  The properties will be finalized once
                msets [15] and swap sets [16] are completed.  This resource
                group and its properties will not be delivered as part of this
                project.  However, our thinking on this is described here for

                physical: A positive decimal number or a range with a required
                        k, m, g, or t modifier.  This will set the 'mset.min'
                        and 'mset.max' properties on the temporary mset.
                        A value of '10m' means ten megabytes.  A value of
                        '.5g-1.5g' means a range of 500 megabytes up to
                        1.5 gigabytes.

                virtual: This accepts the same numbers as the 'physical'
                        property.  This will set the 'mset.minswap' and
                        'mset.maxswap' properties on the temporary mset.

                One or the other of 'physical' and 'virtual' is optional but at
                least one must be specified.

                importance: This property is optional.  It can be a positive
                        integer.  It sets the 'pool.importance' property on
                        the temporary pool.  The underlying code in zonecfg
                        will refer to the same piece of data for importance in
                        both the dedicated-cpu and dedicated-memory case.
                        Thus, you can have a temporary pool with either a
                        temporary pset, a temporary mset or both.  There is
                        only one value for the importance of the temporary

                physical: A positive decimal number with a required k, m, g,
                        or t modifier.  A value of '10m' means ten megabytes.
                        This will be used by rcapd as the max-rss for the
                        zone.  The rcapd enhancement for capping zones is
                        described below in section 4.

                virtual: This property is tentative at this point and will not
                        be delivered as part of this project.  However, our
                        thinking on this is described here for completeness.
                        In the future we would like to deliver a new rctl
                        which would cap the virtual memory consumption of
                        the zone.

        Zonecfg will be enhanced to check for invalid combinations.  This means
        it will disallow a dedicated-cpu resource and the zone.cpu-shares rctl
        being defined at the same time.  It also means that explicitly
        specifying a pool name via the 'pool' resource, along with either a
        'dedicated-cpu' or 'dedicated-memory' resource is an invalid

        These new zonecfg resource names (dedicated-cpu, capped-cpu,
        dedicated-memory & capped-memory) are chosen so as to be reasonably
        clear what the objective is, even though they do not exactly align
        with our existing underlying (and inconsistent) RM naming schemes.

2) Temporary Pools.

        We will implement the concept of "temporary pools" within the pools

        To improve the integration of zones and pools we are allowing the
        configuration of some basic pool attributes within zonecfg, as
        described above in section 1.  However, we do not want to extend
        zonecfg to completely and directly manage standard pool configurations.
        That would lead to confusion and inconsistency regarding which tool to
        use and where configuration data is stored.  Temporary pools sidesteps
        this problem and allows zones to dynamically create a simple pool/pset
        configuration for the basic case where a sysadmin just wants a
        specified number of processors dedicated to the zone (and eventually a
        dedicated amount of memory).

        We believe that the ability to simply specify a fixed number of cpus
        (and eventually a mset size) meets the needs of a large percentage of
        zones users who need "hard" partitioning (e.g. to meet licensing

        If a dedicated-cpu (and/or eventually a dedicated-memory) resource is
        configured for the zone, then when the zone boots zoneadmd will enable
        pools if necessary and create a temporary pool dedicated for the zones
        use.  Zoneadmd will dynamically create a pool & pset (and/or eventually
        a mset) and assign the number of cpus specified in zonecfg to that
        pset.  The temporary pool & pset will be named 'SUNWtmp_{zonename}'.
        Zonecfg validation will disallow an explicit 'pool' property name
        beginning with 'SUNWtmp'.

        Zoneadmd will set the 'pset.min' and 'pset.max' pset properties, as
        well as the 'pool.importance' pool property, based on the values
        specified for dedicated-cpu's 'ncpus' and 'importance' properties
        in zonecfg, as described above in section 1.

        If the cpu (or memory) resources needed to create the temporary pool
        are unavailable, zoneadmd will issue an error and the zone won't boot.

        When the zone is halted, the temporary pool & pset will be destroyed.

        We will add a new boolean libpool(3LIB) property ('temporary') that can
        exist on pools and any pool resource set.  The 'temporary' property
        indicates that the pool or resource set should never be committed to a
        static configuration (e.g. pooladm -s) and that it should never be
        destroyed when updating the dynamic configuration from a static
        configuration (e.g. pooladm -c).  These temporary pools/resources can
        only be managed in the dynamic configuration.  Support for temporary
        pools will be implemented within libpool(3LIB) using the two new
        consolidation private functions listed in the interface table below.

        It is our expectation that most users will never need to manage
        temporary pools through the existing poolcfg(1M) commands.  For users
        who need more sophisticated pool configuration and management, the
        existing 'pool' resource within zonecfg should be used and users
        should manually create a permanent pool using the existing mechanisms.

Will the existing pool commands show the results as if they were created using those commands? It would be a useful learning and templating tool to apply the resulting configuration(s) to scripts using the existing commands for the future.

3) Resource controls in zonecfg will be simplified [8].

        Within zonecfg rctls take a 3-tuple value where only a single
        component is usually of interest (the 'limit').  The other two
        components of the value (the 'priv' and 'action') are not normally
        changed but users can be confused if they don't understand what the
        other components mean or what values should be specified.

        Here is a zonecfg example:
                > add rctl
                rctl> set name=zone.cpu-shares
                rctl> add value (priv=privileged,limit=5,action=none)
                rctl> end

        Within zonecfg we will introduce the idea of rctl aliases.  The alias
        is a simplified name and template for the existing rctls.  Behind the
        scenes we continue to store the data using the existing rctl entries
        in the XML file.  Thus, the alias always refers to the same underlying
        piece of data as the full rctl.

        The purpose of the rctl alias is to provide a simplified name and
        mechanism to set the rctl 'limit'.  For each rctl/alias pair we will
        "know" the expected values for the 'priv' and 'action' components of
        the rctl value.  If an rctl is already defined that does not match this
        "knowledge" (e.g. it has a non-standard 'action' or there are multiple
        values defined for the rctl), then the user will not be allowed to use
        an alias for that rctl.

This should help a lot!

        Here are the aliases we will define for the rctls:
        alias           rctl                    priv            action
        -----           ----                    ----            ------
        max-lwps        zone.max-lwps           privileged      deny
        cpu-shares      zone.cpu-shares         privileged      none

        Coming in the near future, once the associated projects
        integrate [14, 17, 18]
        alias           rctl                    priv            action
        -----           ----                    ----            ------
        cpu-cap         zone.cpu-cap            privileged      deny
        max-locked-memory zone.max-locked-memory privileged     deny
        max-shm-memory  zone.max-shm-memory     privileged      deny
        max-shm-ids     zone.max-shm-ids        privileged      deny
        max-msg-ids     zone.max-msg-ids        privileged      deny
        max-sem-ids     zone.max-sem-ids        privileged      deny

What is the purpose of some of these zone.* controls? Is it to limit what a priviliged user can set the values to for projects, etc. in that zone or does it set the defaults for the zone as well.

I can see it being easier to configure different DB zones from the global zone via zonecfg than having to enter, delegate, and educate the zone users how to set them. I'm leaning towards making these the defaults for the zone, not the just limit.

        Here is an example of the max-lwps alias usage within zonecfg:

                > set max-lwps=500
                > info
                [max-lwps: 500]
                        name: zone.max-lwps
                        value: (priv=privileged,limit=500,action=deny)

        In the example, you can see the use of the alias when setting the
        value and you can also see the full rctl output within the 'info'
        command.  The alias is "flagged" in the output with brackets as
        a visual indicator that the property corresponds to the full
        rctl definition printed later in the output.

        If you update the rctl value through the 'rctl' resource then the
        corresponding value in the aliased property would also be updated since
        both the rctl and its alias refer to the same piece of data.

        If an rctl was already defined that did not match the expected value
        (e.g. it had 'action=none' or multiple values), then the alias will be
        disabled.  An attempt to set the limit via the alias would print the
        following error:
            "An incompatible rctl already exists for this property"

        This rctl alias enhancement is fully backward compatible with the
        existing rctl syntax.  That is, zonecfg output will continue to display
        rctl settings in the current format (in addition to the new aliased
        format) and zonecfg will continue to accept the existing input syntax
        for setting rctls.  This ensures full backward compatibility for any
        existing tools/scripts that parse zonecfg output or configure zones.
        Also, the rctl data will continue to be printed in the output from
        the 'export' subcommand using the existing syntax.

        Future rctls added to zonecfg will also provide aliases following the
pattern described here (e.g. [17, 18]).
        In section 1 we described the special case 'ncpus' rctl alias as a
        property under the capped-cpu resource group.  This property is really
        just another rctl alias for the zone.cpu-cap rctl, with one
        exception; the limit value is scaled up by 100 so that the value can
        be specified in cpu units and aligned with the 'ncpus' property
        under the dedicated-cpu resource group.  Thus, a value of 2
        will really set the zone.cpu-cap rctl limit to 200, which means the
        cpu cap is 200%.  This alias is being described here but will not
        actually be delivered in the first phase of this project since
        cpu-caps [14] are not yet completed.

I can see this getting confusing. Here it is in integer percentages (essentially) , before it was in full CPUs.

        As part of this rctl syntax simplification we also need to simplify
        the syntax for clearing the value of an rctl.  In fact, this is
        actually a general problem in zonecfg [12].  The 'remove' syntax in
        zonecfg is currently defined as:

            Global Scope
                remove resource-type property-name=property-value [,...]
            Resource Scope
                remove property-name property-value

        That is, from the top-level in zonecfg, there is currently no way to
        clear a simple, top-level property and, to clear a resource, it
        must be qualified with one or more property name/value pairs.

        To address this problem, we will add a new 'clear' command so that you
        can clear a top-level property.  For example, 'clear pool' will clear
        the value for the pool property.  You could clear a 'max-lwps' rctl
        alias using 'clear max-lwps'.  We will also eliminate the requirement
        to qualify resources on the 'remove' command.  So, instead of saying
        'remove net physical=bge0', you could just say 'remove net'.  If there
        is only a single 'net' resource defined, it will be removed.  If there
        are multiple 'net' resources, you will be prompted to confirm that all
        of them should be removed:
                Are you sure you want to remove ALL 'net' resources (y/[n])?

        We will add a '-F' option to the 'remove' command so that you can
        force the removal of resources when running on the CLI.  For example.
        '# zonecfg -z foo remove -F net'.

        The existing syntax is still fully supported so you can continue to
        qualify removal of a single instance of a resource.

zones-discuss mailing list

Reply via email to