Overall, these two pieces will be very welcome additions to the arsenal of zone-related resource controls.

I have inserted some requests for clarification in-line.

Dan Price wrote:
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Swap resource control; locked memory RM improvements
Steve Lawrence <[EMAIL PROTECTED]>

SUMMARY:
  This case enhances Solaris Zones[1] and builds upon recent work to
  improve the integration between Zones and Solaris Resource
  Management[2].  The case addresses an existing RFE[6], which requests
  a mechanism to limit system swap reserved by a zone.  The case also
  proposes extensions to [2], which will make swap reservation and
  locked memory resource controls easy to configure on a zone via
  zonecfg(1m).

  1. This case proposes adding the following resource control:

        INTERFACE                               COMMITMENT      BINDING
        "zone.max-swap"                                Committed        Patch

     This control will limit the swap reserved by processes and tmpfs
     mounts within the global zone and non-global zones.  This resource
     control serves to address the referenced RFE[6].

  2. To simplify the configuration of memory-related resource controls
     on zones, this case proposes adding the following properties to
     zonecfg(1M):

        INTERFACE                               COMMITMENT      BINDING
        "swap" zonecfg property                        Committed        Patch
        "locked" zonecfg property              Committed        Patch

     These properties will be added to the zonecfg "capped-memory"
     zonecfg resource introduced by [2].

  3. For observability of zone resource utilization and limits, this
     case proposes the addition of following kstats:

        INTERFACE                               COMMITMENT      BINDING
        zone:{zoneid}:vm:zonename               Uncommitted       Patch
        zone:{zoneid}:vm:swap_reserved          Uncommitted       Patch
        zone:{zoneid}:vm:max_swap_reserved      Uncommitted       Patch
        zone:{zoneid}:vm:locked_memory          Uncommitted       Patch
        zone:{zoneid}:vm:max_locked_memory      Uncommitted       Patch

     These kstats will be of class "misc".  The global zone will see
     kstats for all zones, while non global zones will only see kstats
     with matching zoneid.

  4. prstat(1m) output changes to report swap reserved.

        INTERFACE                               COMMITMENT      BINDING
        prstat(1m) output                       Uncommitted       Patch

     This case proposes changing the "SIZE" column of "prstat -Z" zone
     output lines to "SWAP".  The swap reported will be the total swap
     consumed by the zone's processes and tmpfs mounts.  This value will
     assist administrators in monitoring the swap reserved by each zone,
     allowing them to choose a reasonable "zone.max-swap" settings.

     The "SIZE" column will also be changed to "SWAP" for prstat
     options a, T, and J, for users, tasks, and projects.

The reason for not changing this column in the default output would be helpful.

DETAIL:

  1. "zone.max-swap" resource control.

     Limits swap consumed by user process address space mappings and
     tmpfs mounts within a zone.

     Currently a global or non-global zone can consume all swap
     resources available on the system, limiting the usefulness of zones
     as an application container.  zone.max-swap provides a mechanism to

I would rephrase that as "the container of an application" to avoid confusion with the Solaris feature set called "Containers." I assume that the former was meant moreso than the latter even though Containers are Solaris' implementation of "an application container."

     limit swap consumption per zone.  This will protect other zones
     from runaway memory leakers/consumers and/or tmpfs writers in a
     zone with zone.max-swap configured.

     Another solution to this problem would be a "swap set" [5] feature,
     which would allow the reservation of swap devices into sets to
     which zones could be bound.  While "swap sets" would be useful,
     zone.max-swap provides a simple solution which is easier to
     administer, as it does not require the configuration of pools and
     swap devices/files.

     "zone.max-swap" is not incompatable with swap sets.  In fact, a
     future addition of swap sets could be used in combination with
     zone.max-swap.  For instance, several zones could be bound to the
     same set of swap devices, each with it's own individual
     zone.max-swap configured as a cap within that set.  The
     implementation of "zone.max-swap" is also much less risky to make
     available via patch.

     zone.max-swap will be configurable on both the global zone, and
     non-global zones.  The affect on processes in a zone reaching its
     zone.max-swap limit is the same as if all system swap is reserved.
     Callers of mmap(2) and sbrk(2) will receive EAGAIN.  Writes to
     tmpfs will return ENOSPC, which is the same errno returned when
     a tmpfs mount reaches it's "size" mount option.  The "size" mount
     option limits the quantity of swap that a tmpfs mount can reserve.

With S10 11/06, some zone limitations are now configurable, e.g. setting the system time clock. Similarly, the ability to modify a zone's swap limit could be given to the zone's root user, which might be valuable in some situations. This would be analogous to the 'basic' privilege level. It would allow an advisory limit to be placed on a zone - a limit that the zone admin could modify in unusual circumstances.

I realize that this opens a can of worms in that most rctls are protected by the sys_res_config priv, which is not allowed in a zone even with 11/06. Further, it makes sense to consistently allow or forbid rctl-modification in zones.

I just wanted to mention this idea so that it is not unintentionally overlooked.

     While a low zone.max-swap setting for the global zone can lead to
     a difficult-to-administer global zone, the same problem exists
     today when configuring the zone.max-lwps resource control on the
     global zone, or when all system swap is reserved.

  2. "swap" and "locked" properties for zonecfg(1m) "capped_memory"
     resource.

     [2] added a new 'capped-memory' resource to zonecfg.  This resource
     groups the properties used when capping memory for the zone.   It
     currently has the 'physical' property which specifies the physical
     memory cap for the zone.  We will add two new properties, 'swap'
     and 'locked' to the "capped-memory" resource.  These properties
     will be added by using the rctl alias mechanism which is also
     described in [2].

        swap:   An unsigned decimal number with a required k, m, g, or t
                modifier.  A value of '10m' means ten megabytes."
                This will be used to configure the zone.max-swap
                resource control, which limits swap consumed by
                processes and tmpfs mounts within a zone.

        locked: An unsigned decimal number with a required k, m, g, or t
                modifier.  A value of '10m' means ten megabytes."
                This will be used to configure the
                zone.max-locked-memory[3,4] resource control, which
                limits locked physical memory (made non-pageable) by
                processes within a zone.

  3. Swap and locked memory kstats for zones.

     There is currently no way to observe how much locked memory or swap
     a zone is consuming.  This makes capacity planning and monitoring
     difficult.  To solve this problem, the following kstats are
     proposed:

        INTERFACE                               COMMITMENT      BINDING
        zone:{zoneid}:vm:zonename               Uncommitted       Patch
        zone:{zoneid}:vm:swap_reserved          Uncommitted       Patch
        zone:{zoneid}:vm:max_swap_reserved      Uncommitted       Patch
        zone:{zoneid}:vm:locked_memory          Uncommitted       Patch
        zone:{zoneid}:vm:max_locked_memory      Uncommitted       Patch

     Higher level monitoring scripts/tools can be developed in the
     future to consume these kstats, or a future version of these
     kstats.

        STATISTIC               DESCRIPTION
        zonename                The name of the zone with {zoneid}
        swap reserved:          swap reserved by zone in bytes.

Does swap_reserved include pages shared with other zones, e.g. text pages?

        max_swap_reserved:      current zone.max-swap limit in bytes,
        locked_memory:          physical memory locked by zone in bytes.

Does locked_memory include pages shared with other zones, e.g. text pages?

        max_locked_memory:      current zone.max-locked-memory limit in
                                bytes.

     These kstats can be consumed by higher level tools/scripts to
     provide information about zone memory usage.  Each kstats instance
     number matches the zoneid of the zone it represents. Non-global
     zones will only be able to read the kstat with matching zoneid  The
     global zone will be able to read all kstats.

  4. prstat(1m) output changes to report swap reserved.

        INTERFACE                               COMMITMENT      BINDING
        prstat(1m) output                       Uncommitted       Patch

     This case proposes changing the "SIZE" column of "prstat -Z" zone
     output lines to "SWAP".  The swap reported will be the total swap
     consumed by the zone's processes and tmpfs mounts.  This value
     will assist administrators in monitoring the swap reserved by each
     zone, allowing them to choose a reasonable "zone.max-swap"
     settings.

     The "SIZE" column will also be changed to "SWAP" for prstat
     options a, T, and J, for users, tasks, and projects.

     The current "SIZE" column arbitrarily sums the address spaces of
     the processes in each zone.  This sum include device mappings,
     but does not include NORESERVE segments.  This sum does not map
     to any real system resource, and therefore provides no meaningful
     information.

REFERENCES:

[1] PSARC/2002/174 Virtualization and Namespace Isolation in Solaris
    http://sac.sfbay.sun.com/PSARC/2002/174
    http://www.opensolaris.org/os/community/arc/caselog/2002/174/

[2] PSARC/2006/496 Improved Zones/RM Integration
    http://sac.sfbay.sun.com/PSARC/2006/496/
    http://www.opensolaris.org/os/community/arc/caselog/2006/496/

[3] PSARC/2006/463 Amendment to zone/project.max-locked-memory Resource
    Controls
    http://sac.sfbay.sun.com/PSARC/2006/463/
    http://www.opensolaris.org/os/community/arc/caselog/2006/463/

[4] PSARC/2004/580 zone/project.max-locked-memory Resource Controls
    http://sac.sfbay.sun.com/PSARC/2004/580/
    http://www.opensolaris.org/os/community/arc/caselog/2004/580/

[5] PSARC/2002/181 Swap Sets
    http://sac.sfbay.sun.com/PSARC/2002/181/
    http://www.opensolaris.org/os/community/arc/caselog/2002/181/

[6] 5103071 RFE: local zones can run the global zone out of swap
    http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=5103071



--------------------------------------------------------------------------
Jeff VICTOR              Sun Microsystems            jeff.victor @ sun.com
OS Ambassador            Sr. Technical Specialist
Solaris 10 Zones FAQ:    http://www.opensolaris.org/os/community/zones/faq
--------------------------------------------------------------------------
_______________________________________________
zones-discuss mailing list
zones-discuss@opensolaris.org

Reply via email to