At the request of the project team, I'm restarting this case with the new
spec below.  The new timer is set for 20 Nov, 2006.  The original spec
is in the case directory as spec.orig.  This spec is in the case directory
as spec.txt.

The project team didn't supply a summary of the changes, so I'll be
asking for one in a follow on.

Gary..
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Swap resource control; locked memory RM improvements
Steve Lawrence <[EMAIL PROTECTED]>

SUMMARY:
  This case enhances Solaris Zones[1] and builds upon recent work to
  improve the integration between Zones and Solaris Resource
  Management[2].  The case addresses an existing RFE[6], which requests
  a mechanism to limit system swap reserved by a zone.  The case also
  proposes extensions to [2], which will make swap reservation and
  locked memory resource controls easy to configure on a zone via
  zonecfg(1m).

  1. This case proposes adding the following resource control:

        INTERFACE                               COMMITMENT      BINDING
        "zone.max-swap"                          Committed        Patch

     This control will limit the swap reserved by processes and tmpfs
     mounts within the global zone and non-global zones.  This resource
     control serves to address the referenced RFE[6].

  2. To simplify the configuration of memory-related resource controls
     on zones, this case proposes adding the following properties to
     zonecfg(1M):

        INTERFACE                               COMMITMENT      BINDING
        "swap" zonecfg property                  Committed        Patch
        "locked" zonecfg property                Committed        Patch

     These properties will be added to the zonecfg "capped-memory"
     zonecfg resource introduced by [2].

  3. For observability of zone resource utilization and limits, this
     case proposes the addition of following kstats:
        
        INTERFACE                               COMMITMENT      BINDING
        caps:{zoneid}:swapresv_zone_{zoneid}    Uncommitted     Patch
        caps:{zoneid}:lockedmem_zone_{zoneid}   Uncommitted     Patch

     To observe project resource utilization, this case also proposes
     the following kstat:

        INTERFACE                                COMMITMENT     BINDING
        caps:{zoneid}:lockedmem_project_{projid} Uncommitted    Patch

     The projid cannot be used as the instance number, as each zone
     has a unique project namespace.  This means project 0 in the
     global zone is different from project 0 in each non global zone.

     The global zone will see kstats for all zones, while non global
     zones will only see kstats with matching zoneid.

  4. prstat(1m) output changes to report swap reserved.

        INTERFACE                               COMMITMENT      BINDING
        prstat(1m) output                       Uncommitted       Patch

     This case proposes changing the "SIZE" column of "prstat -Z" zone
     output lines to "SWAP".  The swap reported will be the total swap
     consumed by the zone's processes and tmpfs mounts.  This value will
     assist administrators in monitoring the swap reserved by each zone,
     allowing them to choose a reasonable "zone.max-swap" settings.


DETAIL:

  1. "zone.max-swap" resource control.

     Limits swap consumed by user process address space mappings and
     tmpfs mounts within a zone.

     Currently a global or non-global zone can consume all swap
     resources available on the system, limiting the usefulness of zones
     as an application container.  zone.max-swap provides a mechanism to
     limit swap consumption per zone.  This will protect other zones
     from runaway memory leakers/consumers and/or tmpfs writers in a
     zone with zone.max-swap configured.

     Another solution to this problem would be a "swap set" [5] feature,
     which would allow the reservation of swap devices into sets to
     which zones could be bound.  While "swap sets" would be useful,
     zone.max-swap provides a simple solution which is easier to
     administer, as it does not require the configuration of pools and
     swap devices/files.

     "zone.max-swap" is not incompatable with swap sets.  In fact, a
     future addition of swap sets could be used in combination with
     zone.max-swap.  For instance, several zones could be bound to the
     same set of swap devices, each with it's own individual
     zone.max-swap configured as a cap within that set.  The
     implementation of "zone.max-swap" is also much less risky to make
     available via patch.

     zone.max-swap will be configurable on both the global zone, and
     non-global zones.  The affect on processes in a zone reaching its
     zone.max-swap limit is the same as if all system swap is reserved.
     Callers of mmap(2) and sbrk(2) will receive EAGAIN.  Writes to
     tmpfs will return ENOSPC, which is the same errno returned when
     a tmpfs mount reaches it's "size" mount option.  The "size" mount
     option limits the quantity of swap that a tmpfs mount can reserve.

     While a low zone.max-swap setting for the global zone can lead to
     a difficult-to-administer global zone, the same problem exists
     today when configuring the zone.max-lwps resource control on the
     global zone, or when all system swap is reserved.  The zonecfg(1m)
     enhancements detailed below will help administrators configure
     zone.max-swap safely.


  2. "swap" and "locked" properties for zonecfg(1m) "capped_memory"
     resource.

     [2] added a new 'capped-memory' resource to zonecfg.  This resource
     groups the properties used when capping memory for the zone.   It
     currently has the 'physical' property which specifies the physical
     memory cap for the zone.  We will add two new properties, 'swap'
     and 'locked' to the "capped-memory" resource.  These properties
     will be added by using the rctl alias mechanism which is also
     described in [2].

        swap:   An unsigned decimal number with a required k, m, g, or t
                modifier.  A value of '10m' means ten megabytes."
                This will be used to configure the zone.max-swap
                resource control, which limits swap consumed by
                processes and tmpfs mounts within a zone.

        locked: An unsigned decimal number with a required k, m, g, or t
                modifier.  A value of '10m' means ten megabytes."
                This will be used to configure the
                zone.max-locked-memory[3,4] resource control, which
                limits locked physical memory (made non-pageable) by
                processes within a zone.

     To prevent administrators from configuring a low swap limit that
     will prevent a system from booting, zonecfg will not allow a
     swap limit to be configured to less than:

        Global zone:     100M
        Non-global zone: 50M.

     These numbers are based on the swap needed to boota zone after a
     default installation.
 
     Also, if zone.max-swap is configured (via zonecfg(1m)) on the
     global zone, a warning will be printed:

        global:capped-memory> set swap=200M
        Warning: Setting capped swap on the global zone can impact
        system availability.

     Similar warnings will be printed for setting other rctls on the
     global zone which can affect availability, such as zone.max-lwps.

  3. For observability of zone resource utilization and limits, this
     case proposes the addition of following kstats:
        
        INTERFACE                               COMMITMENT      BINDING
        caps:{zoneid}:swapresv_zone_{zoneid}    Uncommitted     Patch
        caps:{zoneid}:lockedmem_zone_{zoneid}   Uncommitted     Patch

     To observe project resource utilization, this case also proposes
     the following kstat:

        INTERFACE                                COMMITMENT     BINDING
        caps:{zoneid}:lockedmem_project_{projid} Uncommitted    Patch

     The projid cannot be used as the instance number, as each zone
     has a unique project namespace.  This means project 0 in the
     global zone is different from project 0 in each non global zone.

     The global zone will see kstats for all zones, while non global
     zones will only see kstats with matching zoneid.

     Each kstat will have the statistics:

        usage:          The current quantity of resource consumed.
        value:          The current enforced cap.
        zonename:       The name of the zone.  A zone may change zoneid
                        each time it boots, so this statistic helps to
                        match the kstat to the zone.

     These kstats can be consumed by higher level tools/scripts to
     provide information about zone memory usage.  Each kstats instance
     number matches the zoneid of the zone it represents. Non-global
     zones will only be able to read the kstat with matching zoneid.
     The global zone will be able to read all kstats.

     Additional kstats will be added in the future to report usage and
     cap for other rctls.  Addressing existing rctls is outside the
     scope of this case.

  4. prstat(1m) output changes to report swap reserved.

        INTERFACE                               COMMITMENT      BINDING
        prstat(1m) output                       Uncommitted       Patch

     This case proposes changing the "SIZE" column of "prstat -Z" zone
     output lines to "SWAP".  The swap reported will be the total swap
     consumed by the zone's processes and tmpfs mounts.  This value
     will assist administrators in monitoring the swap reserved by each
     zone, allowing them to choose a reasonable "zone.max-swap"
     settings.

     The "SIZE" column will also be changed to "SWAP" for prstat
     options a, T, and J, for users, tasks, and projects.

     The current "SIZE" column arbitrarily sums the address spaces of
     the processes in each zone.  This sum include device mappings,
     but does not include NORESERVE segments.  This sum does not map
     to real system resources, and therefore provides no meaningful
     information when summed across all processes belonging to a zone,
     project, task, or user.

     For the default prstat process listing, "SIZE" will not be changed
     to swap, as the virtual address space size for each process is a
     useful number.  Detailed per process memory consumption reporting
     is outside the scope of this case, and would be better addressed
     by a case proposing a solution for 6487372[7]:

        RFE: prstat -x: Providing VSZ/RSS/ANON/LOCK Memory & CPU Usage

     This RFE requests displaying detailed memory usage per process.
     "SWAP" reservation certainly falls into this category.

REFERENCES:

[1] PSARC/2002/174 Virtualization and Namespace Isolation in Solaris
    http://sac.sfbay.sun.com/PSARC/2002/174
    http://www.opensolaris.org/os/community/arc/caselog/2002/174/

[2] PSARC/2006/496 Improved Zones/RM Integration
    http://sac.sfbay.sun.com/PSARC/2006/496/
    http://www.opensolaris.org/os/community/arc/caselog/2006/496/

[3] PSARC/2006/463 Amendment to zone/project.max-locked-memory Resource
    Controls
    http://sac.sfbay.sun.com/PSARC/2006/463/
    http://www.opensolaris.org/os/community/arc/caselog/2006/463/

[4] PSARC/2004/580 zone/project.max-locked-memory Resource Controls
    http://sac.sfbay.sun.com/PSARC/2004/580/
    http://www.opensolaris.org/os/community/arc/caselog/2004/580/

[5] PSARC/2002/181 Swap Sets
    http://sac.sfbay.sun.com/PSARC/2002/181/
    http://www.opensolaris.org/os/community/arc/caselog/2002/181/

[6] 5103071 RFE: local zones can run the global zone out of swap
    http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=5103071

[7] RFE: prstat -x: Providing VSZ/RSS/ANON/LOCK Memory & CPU Usage
    http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6487372

>From [EMAIL PROTECTED] Fri Nov 10 13:25:19 2006
Date: Fri, 10 Nov 2006 13:21:17 -0800
From: Steve Lawrence <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: new spec for PSARC/2006/598
Content-Disposition: inline
User-Agent: Mutt/1.4.2.1i
Status: RO
X-Lines: 259
Content-Type: text/plain; charset="us-ascii"
Content-Length: 10884

hey Gary,

Here is the new spec.

-Steve.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Swap resource control; locked memory RM improvements
Steve Lawrence <[EMAIL PROTECTED]>

SUMMARY:
  This case enhances Solaris Zones[1] and builds upon recent work to
  improve the integration between Zones and Solaris Resource
  Management[2].  The case addresses an existing RFE[6], which requests
  a mechanism to limit system swap reserved by a zone.  The case also
  proposes extensions to [2], which will make swap reservation and
  locked memory resource controls easy to configure on a zone via
  zonecfg(1m).

  1. This case proposes adding the following resource control:

        INTERFACE                               COMMITMENT      BINDING
        "zone.max-swap"                          Committed        Patch

     This control will limit the swap reserved by processes and tmpfs
     mounts within the global zone and non-global zones.  This resource
     control serves to address the referenced RFE[6].

  2. To simplify the configuration of memory-related resource controls
     on zones, this case proposes adding the following properties to
     zonecfg(1M):

        INTERFACE                               COMMITMENT      BINDING
        "swap" zonecfg property                  Committed        Patch
        "locked" zonecfg property                Committed        Patch

     These properties will be added to the zonecfg "capped-memory"
     zonecfg resource introduced by [2].

  3. For observability of zone resource utilization and limits, this
     case proposes the addition of following kstats:
        
        INTERFACE                               COMMITMENT      BINDING
        caps:{zoneid}:swapresv_zone_{zoneid}    Uncommitted     Patch
        caps:{zoneid}:lockedmem_zone_{zoneid}   Uncommitted     Patch

     To observe project resource utilization, this case also proposes
     the following kstat:

        INTERFACE                                COMMITMENT     BINDING
        caps:{zoneid}:lockedmem_project_{projid} Uncommitted    Patch

     The projid cannot be used as the instance number, as each zone
     has a unique project namespace.  This means project 0 in the
     global zone is different from project 0 in each non global zone.

     The global zone will see kstats for all zones, while non global
     zones will only see kstats with matching zoneid.

  4. prstat(1m) output changes to report swap reserved.

        INTERFACE                               COMMITMENT      BINDING
        prstat(1m) output                       Uncommitted       Patch

     This case proposes changing the "SIZE" column of "prstat -Z" zone
     output lines to "SWAP".  The swap reported will be the total swap
     consumed by the zone's processes and tmpfs mounts.  This value will
     assist administrators in monitoring the swap reserved by each zone,
     allowing them to choose a reasonable "zone.max-swap" settings.


DETAIL:

  1. "zone.max-swap" resource control.

     Limits swap consumed by user process address space mappings and
     tmpfs mounts within a zone.

     Currently a global or non-global zone can consume all swap
     resources available on the system, limiting the usefulness of zones
     as an application container.  zone.max-swap provides a mechanism to
     limit swap consumption per zone.  This will protect other zones
     from runaway memory leakers/consumers and/or tmpfs writers in a
     zone with zone.max-swap configured.

     Another solution to this problem would be a "swap set" [5] feature,
     which would allow the reservation of swap devices into sets to
     which zones could be bound.  While "swap sets" would be useful,
     zone.max-swap provides a simple solution which is easier to
     administer, as it does not require the configuration of pools and
     swap devices/files.

     "zone.max-swap" is not incompatable with swap sets.  In fact, a
     future addition of swap sets could be used in combination with
     zone.max-swap.  For instance, several zones could be bound to the
     same set of swap devices, each with it's own individual
     zone.max-swap configured as a cap within that set.  The
     implementation of "zone.max-swap" is also much less risky to make
     available via patch.

     zone.max-swap will be configurable on both the global zone, and
     non-global zones.  The affect on processes in a zone reaching its
     zone.max-swap limit is the same as if all system swap is reserved.
     Callers of mmap(2) and sbrk(2) will receive EAGAIN.  Writes to
     tmpfs will return ENOSPC, which is the same errno returned when
     a tmpfs mount reaches it's "size" mount option.  The "size" mount
     option limits the quantity of swap that a tmpfs mount can reserve.

     While a low zone.max-swap setting for the global zone can lead to
     a difficult-to-administer global zone, the same problem exists
     today when configuring the zone.max-lwps resource control on the
     global zone, or when all system swap is reserved.  The zonecfg(1m)
     enhancements detailed below will help administrators configure
     zone.max-swap safely.


  2. "swap" and "locked" properties for zonecfg(1m) "capped_memory"
     resource.

     [2] added a new 'capped-memory' resource to zonecfg.  This resource
     groups the properties used when capping memory for the zone.   It
     currently has the 'physical' property which specifies the physical
     memory cap for the zone.  We will add two new properties, 'swap'
     and 'locked' to the "capped-memory" resource.  These properties
     will be added by using the rctl alias mechanism which is also
     described in [2].

        swap:   An unsigned decimal number with a required k, m, g, or t
                modifier.  A value of '10m' means ten megabytes."
                This will be used to configure the zone.max-swap
                resource control, which limits swap consumed by
                processes and tmpfs mounts within a zone.

        locked: An unsigned decimal number with a required k, m, g, or t
                modifier.  A value of '10m' means ten megabytes."
                This will be used to configure the
                zone.max-locked-memory[3,4] resource control, which
                limits locked physical memory (made non-pageable) by
                processes within a zone.

     To prevent administrators from configuring a low swap limit that
     will prevent a system from booting, zonecfg will not allow a
     swap limit to be configured to less than:

        Global zone:     100M
        Non-global zone: 50M.

     These numbers are based on the swap needed to boota zone after a
     default installation.
 
     Also, if zone.max-swap is configured (via zonecfg(1m)) on the
     global zone, a warning will be printed:

        global:capped-memory> set swap=200M
        Warning: Setting capped swap on the global zone can impact
        system availability.

     Similar warnings will be printed for setting other rctls on the
     global zone which can affect availability, such as zone.max-lwps.

  3. For observability of zone resource utilization and limits, this
     case proposes the addition of following kstats:
        
        INTERFACE                               COMMITMENT      BINDING
        caps:{zoneid}:swapresv_zone_{zoneid}    Uncommitted     Patch
        caps:{zoneid}:lockedmem_zone_{zoneid}   Uncommitted     Patch

     To observe project resource utilization, this case also proposes
     the following kstat:

        INTERFACE                                COMMITMENT     BINDING
        caps:{zoneid}:lockedmem_project_{projid} Uncommitted    Patch

     The projid cannot be used as the instance number, as each zone
     has a unique project namespace.  This means project 0 in the
     global zone is different from project 0 in each non global zone.

     The global zone will see kstats for all zones, while non global
     zones will only see kstats with matching zoneid.

     Each kstat will have the statistics:

        usage:          The current quantity of resource consumed.
        value:          The current enforced cap.
        zonename:       The name of the zone.  A zone may change zoneid
                        each time it boots, so this statistic helps to
                        match the kstat to the zone.

     These kstats can be consumed by higher level tools/scripts to
     provide information about zone memory usage.  Each kstats instance
     number matches the zoneid of the zone it represents. Non-global
     zones will only be able to read the kstat with matching zoneid.
     The global zone will be able to read all kstats.

     Additional kstats will be added in the future to report usage and
     cap for other rctls.  Addressing existing rctls is outside the
     scope of this case.

  4. prstat(1m) output changes to report swap reserved.

        INTERFACE                               COMMITMENT      BINDING
        prstat(1m) output                       Uncommitted       Patch

     This case proposes changing the "SIZE" column of "prstat -Z" zone
     output lines to "SWAP".  The swap reported will be the total swap
     consumed by the zone's processes and tmpfs mounts.  This value
     will assist administrators in monitoring the swap reserved by each
     zone, allowing them to choose a reasonable "zone.max-swap"
     settings.

     The "SIZE" column will also be changed to "SWAP" for prstat
     options a, T, and J, for users, tasks, and projects.

     The current "SIZE" column arbitrarily sums the address spaces of
     the processes in each zone.  This sum include device mappings,
     but does not include NORESERVE segments.  This sum does not map
     to real system resources, and therefore provides no meaningful
     information when summed across all processes belonging to a zone,
     project, task, or user.

     For the default prstat process listing, "SIZE" will not be changed
     to swap, as the virtual address space size for each process is a
     useful number.  Detailed per process memory consumption reporting
     is outside the scope of this case, and would be better addressed
     by a case proposing a solution for 6487372[7]:

        RFE: prstat -x: Providing VSZ/RSS/ANON/LOCK Memory & CPU Usage

     This RFE requests displaying detailed memory usage per process.
     "SWAP" reservation certainly falls into this category.

REFERENCES:

[1] PSARC/2002/174 Virtualization and Namespace Isolation in Solaris
    http://sac.sfbay.sun.com/PSARC/2002/174
    http://www.opensolaris.org/os/community/arc/caselog/2002/174/

[2] PSARC/2006/496 Improved Zones/RM Integration
    http://sac.sfbay.sun.com/PSARC/2006/496/
    http://www.opensolaris.org/os/community/arc/caselog/2006/496/

[3] PSARC/2006/463 Amendment to zone/project.max-locked-memory Resource
    Controls
    http://sac.sfbay.sun.com/PSARC/2006/463/
    http://www.opensolaris.org/os/community/arc/caselog/2006/463/

[4] PSARC/2004/580 zone/project.max-locked-memory Resource Controls
    http://sac.sfbay.sun.com/PSARC/2004/580/
    http://www.opensolaris.org/os/community/arc/caselog/2004/580/

[5] PSARC/2002/181 Swap Sets
    http://sac.sfbay.sun.com/PSARC/2002/181/
    http://www.opensolaris.org/os/community/arc/caselog/2002/181/

[6] 5103071 RFE: local zones can run the global zone out of swap
    http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=5103071

[7] RFE: prstat -x: Providing VSZ/RSS/ANON/LOCK Memory & CPU Usage
    http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6487372

_______________________________________________
zones-discuss mailing list
zones-discuss@opensolaris.org

Reply via email to