At the request of the project team, I'm restarting this case with the new
spec below. The new timer is set for 20 Nov, 2006. The original spec
is in the case directory as spec.orig. This spec is in the case directory
as spec.txt.
The project team didn't supply a summary of the changes, so I'll be
asking for one in a follow on.
Gary..
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Swap resource control; locked memory RM improvements
Steve Lawrence <Stephen.Lawrence at sun.com>
SUMMARY:
This case enhances Solaris Zones[1] and builds upon recent work to
improve the integration between Zones and Solaris Resource
Management[2]. The case addresses an existing RFE[6], which requests
a mechanism to limit system swap reserved by a zone. The case also
proposes extensions to [2], which will make swap reservation and
locked memory resource controls easy to configure on a zone via
zonecfg(1m).
1. This case proposes adding the following resource control:
INTERFACE COMMITMENT BINDING
"zone.max-swap" Committed Patch
This control will limit the swap reserved by processes and tmpfs
mounts within the global zone and non-global zones. This resource
control serves to address the referenced RFE[6].
2. To simplify the configuration of memory-related resource controls
on zones, this case proposes adding the following properties to
zonecfg(1M):
INTERFACE COMMITMENT BINDING
"swap" zonecfg property Committed Patch
"locked" zonecfg property Committed Patch
These properties will be added to the zonecfg "capped-memory"
zonecfg resource introduced by [2].
3. For observability of zone resource utilization and limits, this
case proposes the addition of following kstats:
INTERFACE COMMITMENT BINDING
caps:{zoneid}:swapresv_zone_{zoneid} Uncommitted Patch
caps:{zoneid}:lockedmem_zone_{zoneid} Uncommitted Patch
To observe project resource utilization, this case also proposes
the following kstat:
INTERFACE COMMITMENT BINDING
caps:{zoneid}:lockedmem_project_{projid} Uncommitted Patch
The projid cannot be used as the instance number, as each zone
has a unique project namespace. This means project 0 in the
global zone is different from project 0 in each non global zone.
The global zone will see kstats for all zones, while non global
zones will only see kstats with matching zoneid.
4. prstat(1m) output changes to report swap reserved.
INTERFACE COMMITMENT BINDING
prstat(1m) output Uncommitted Patch
This case proposes changing the "SIZE" column of "prstat -Z" zone
output lines to "SWAP". The swap reported will be the total swap
consumed by the zone's processes and tmpfs mounts. This value will
assist administrators in monitoring the swap reserved by each zone,
allowing them to choose a reasonable "zone.max-swap" settings.
DETAIL:
1. "zone.max-swap" resource control.
Limits swap consumed by user process address space mappings and
tmpfs mounts within a zone.
Currently a global or non-global zone can consume all swap
resources available on the system, limiting the usefulness of zones
as an application container. zone.max-swap provides a mechanism to
limit swap consumption per zone. This will protect other zones
from runaway memory leakers/consumers and/or tmpfs writers in a
zone with zone.max-swap configured.
Another solution to this problem would be a "swap set" [5] feature,
which would allow the reservation of swap devices into sets to
which zones could be bound. While "swap sets" would be useful,
zone.max-swap provides a simple solution which is easier to
administer, as it does not require the configuration of pools and
swap devices/files.
"zone.max-swap" is not incompatable with swap sets. In fact, a
future addition of swap sets could be used in combination with
zone.max-swap. For instance, several zones could be bound to the
same set of swap devices, each with it's own individual
zone.max-swap configured as a cap within that set. The
implementation of "zone.max-swap" is also much less risky to make
available via patch.
zone.max-swap will be configurable on both the global zone, and
non-global zones. The affect on processes in a zone reaching its
zone.max-swap limit is the same as if all system swap is reserved.
Callers of mmap(2) and sbrk(2) will receive EAGAIN. Writes to
tmpfs will return ENOSPC, which is the same errno returned when
a tmpfs mount reaches it's "size" mount option. The "size" mount
option limits the quantity of swap that a tmpfs mount can reserve.
While a low zone.max-swap setting for the global zone can lead to
a difficult-to-administer global zone, the same problem exists
today when configuring the zone.max-lwps resource control on the
global zone, or when all system swap is reserved. The zonecfg(1m)
enhancements detailed below will help administrators configure
zone.max-swap safely.
2. "swap" and "locked" properties for zonecfg(1m) "capped_memory"
resource.
[2] added a new 'capped-memory' resource to zonecfg. This resource
groups the properties used when capping memory for the zone. It
currently has the 'physical' property which specifies the physical
memory cap for the zone. We will add two new properties, 'swap'
and 'locked' to the "capped-memory" resource. These properties
will be added by using the rctl alias mechanism which is also
described in [2].
swap: An unsigned decimal number with a required k, m, g, or t
modifier. A value of '10m' means ten megabytes."
This will be used to configure the zone.max-swap
resource control, which limits swap consumed by
processes and tmpfs mounts within a zone.
locked: An unsigned decimal number with a required k, m, g, or t
modifier. A value of '10m' means ten megabytes."
This will be used to configure the
zone.max-locked-memory[3,4] resource control, which
limits locked physical memory (made non-pageable) by
processes within a zone.
To prevent administrators from configuring a low swap limit that
will prevent a system from booting, zonecfg will not allow a
swap limit to be configured to less than:
Global zone: 100M
Non-global zone: 50M.
These numbers are based on the swap needed to boota zone after a
default installation.
Also, if zone.max-swap is configured (via zonecfg(1m)) on the
global zone, a warning will be printed:
global:capped-memory> set swap=200M
Warning: Setting capped swap on the global zone can impact
system availability.
Similar warnings will be printed for setting other rctls on the
global zone which can affect availability, such as zone.max-lwps.
3. For observability of zone resource utilization and limits, this
case proposes the addition of following kstats:
INTERFACE COMMITMENT BINDING
caps:{zoneid}:swapresv_zone_{zoneid} Uncommitted Patch
caps:{zoneid}:lockedmem_zone_{zoneid} Uncommitted Patch
To observe project resource utilization, this case also proposes
the following kstat:
INTERFACE COMMITMENT BINDING
caps:{zoneid}:lockedmem_project_{projid} Uncommitted Patch
The projid cannot be used as the instance number, as each zone
has a unique project namespace. This means project 0 in the
global zone is different from project 0 in each non global zone.
The global zone will see kstats for all zones, while non global
zones will only see kstats with matching zoneid.
Each kstat will have the statistics:
usage: The current quantity of resource consumed.
value: The current enforced cap.
zonename: The name of the zone. A zone may change zoneid
each time it boots, so this statistic helps to
match the kstat to the zone.
These kstats can be consumed by higher level tools/scripts to
provide information about zone memory usage. Each kstats instance
number matches the zoneid of the zone it represents. Non-global
zones will only be able to read the kstat with matching zoneid.
The global zone will be able to read all kstats.
Additional kstats will be added in the future to report usage and
cap for other rctls. Addressing existing rctls is outside the
scope of this case.
4. prstat(1m) output changes to report swap reserved.
INTERFACE COMMITMENT BINDING
prstat(1m) output Uncommitted Patch
This case proposes changing the "SIZE" column of "prstat -Z" zone
output lines to "SWAP". The swap reported will be the total swap
consumed by the zone's processes and tmpfs mounts. This value
will assist administrators in monitoring the swap reserved by each
zone, allowing them to choose a reasonable "zone.max-swap"
settings.
The "SIZE" column will also be changed to "SWAP" for prstat
options a, T, and J, for users, tasks, and projects.
The current "SIZE" column arbitrarily sums the address spaces of
the processes in each zone. This sum include device mappings,
but does not include NORESERVE segments. This sum does not map
to real system resources, and therefore provides no meaningful
information when summed across all processes belonging to a zone,
project, task, or user.
For the default prstat process listing, "SIZE" will not be changed
to swap, as the virtual address space size for each process is a
useful number. Detailed per process memory consumption reporting
is outside the scope of this case, and would be better addressed
by a case proposing a solution for 6487372[7]:
RFE: prstat -x: Providing VSZ/RSS/ANON/LOCK Memory & CPU Usage
This RFE requests displaying detailed memory usage per process.
"SWAP" reservation certainly falls into this category.
REFERENCES:
[1] PSARC/2002/174 Virtualization and Namespace Isolation in Solaris
http://sac.sfbay.sun.com/PSARC/2002/174
http://www.opensolaris.org/os/community/arc/caselog/2002/174/
[2] PSARC/2006/496 Improved Zones/RM Integration
http://sac.sfbay.sun.com/PSARC/2006/496/
http://www.opensolaris.org/os/community/arc/caselog/2006/496/
[3] PSARC/2006/463 Amendment to zone/project.max-locked-memory Resource
Controls
http://sac.sfbay.sun.com/PSARC/2006/463/
http://www.opensolaris.org/os/community/arc/caselog/2006/463/
[4] PSARC/2004/580 zone/project.max-locked-memory Resource Controls
http://sac.sfbay.sun.com/PSARC/2004/580/
http://www.opensolaris.org/os/community/arc/caselog/2004/580/
[5] PSARC/2002/181 Swap Sets
http://sac.sfbay.sun.com/PSARC/2002/181/
http://www.opensolaris.org/os/community/arc/caselog/2002/181/
[6] 5103071 RFE: local zones can run the global zone out of swap
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=5103071
[7] RFE: prstat -x: Providing VSZ/RSS/ANON/LOCK Memory & CPU Usage
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6487372
>From sl108498 at steve1.sfbay.sun.com Fri Nov 10 13:25:19 2006
Date: Fri, 10 Nov 2006 13:21:17 -0800
From: Steve Lawrence <[email protected]>
To: gary.winiger at sun.com, zones-core at sun.com
Subject: new spec for PSARC/2006/598
Content-Disposition: inline
User-Agent: Mutt/1.4.2.1i
Status: RO
X-Lines: 259
Content-Type: text/plain; charset="us-ascii"
Content-Length: 10884
hey Gary,
Here is the new spec.
-Steve.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Swap resource control; locked memory RM improvements
Steve Lawrence <Stephen.Lawrence at sun.com>
SUMMARY:
This case enhances Solaris Zones[1] and builds upon recent work to
improve the integration between Zones and Solaris Resource
Management[2]. The case addresses an existing RFE[6], which requests
a mechanism to limit system swap reserved by a zone. The case also
proposes extensions to [2], which will make swap reservation and
locked memory resource controls easy to configure on a zone via
zonecfg(1m).
1. This case proposes adding the following resource control:
INTERFACE COMMITMENT BINDING
"zone.max-swap" Committed Patch
This control will limit the swap reserved by processes and tmpfs
mounts within the global zone and non-global zones. This resource
control serves to address the referenced RFE[6].
2. To simplify the configuration of memory-related resource controls
on zones, this case proposes adding the following properties to
zonecfg(1M):
INTERFACE COMMITMENT BINDING
"swap" zonecfg property Committed Patch
"locked" zonecfg property Committed Patch
These properties will be added to the zonecfg "capped-memory"
zonecfg resource introduced by [2].
3. For observability of zone resource utilization and limits, this
case proposes the addition of following kstats:
INTERFACE COMMITMENT BINDING
caps:{zoneid}:swapresv_zone_{zoneid} Uncommitted Patch
caps:{zoneid}:lockedmem_zone_{zoneid} Uncommitted Patch
To observe project resource utilization, this case also proposes
the following kstat:
INTERFACE COMMITMENT BINDING
caps:{zoneid}:lockedmem_project_{projid} Uncommitted Patch
The projid cannot be used as the instance number, as each zone
has a unique project namespace. This means project 0 in the
global zone is different from project 0 in each non global zone.
The global zone will see kstats for all zones, while non global
zones will only see kstats with matching zoneid.
4. prstat(1m) output changes to report swap reserved.
INTERFACE COMMITMENT BINDING
prstat(1m) output Uncommitted Patch
This case proposes changing the "SIZE" column of "prstat -Z" zone
output lines to "SWAP". The swap reported will be the total swap
consumed by the zone's processes and tmpfs mounts. This value will
assist administrators in monitoring the swap reserved by each zone,
allowing them to choose a reasonable "zone.max-swap" settings.
DETAIL:
1. "zone.max-swap" resource control.
Limits swap consumed by user process address space mappings and
tmpfs mounts within a zone.
Currently a global or non-global zone can consume all swap
resources available on the system, limiting the usefulness of zones
as an application container. zone.max-swap provides a mechanism to
limit swap consumption per zone. This will protect other zones
from runaway memory leakers/consumers and/or tmpfs writers in a
zone with zone.max-swap configured.
Another solution to this problem would be a "swap set" [5] feature,
which would allow the reservation of swap devices into sets to
which zones could be bound. While "swap sets" would be useful,
zone.max-swap provides a simple solution which is easier to
administer, as it does not require the configuration of pools and
swap devices/files.
"zone.max-swap" is not incompatable with swap sets. In fact, a
future addition of swap sets could be used in combination with
zone.max-swap. For instance, several zones could be bound to the
same set of swap devices, each with it's own individual
zone.max-swap configured as a cap within that set. The
implementation of "zone.max-swap" is also much less risky to make
available via patch.
zone.max-swap will be configurable on both the global zone, and
non-global zones. The affect on processes in a zone reaching its
zone.max-swap limit is the same as if all system swap is reserved.
Callers of mmap(2) and sbrk(2) will receive EAGAIN. Writes to
tmpfs will return ENOSPC, which is the same errno returned when
a tmpfs mount reaches it's "size" mount option. The "size" mount
option limits the quantity of swap that a tmpfs mount can reserve.
While a low zone.max-swap setting for the global zone can lead to
a difficult-to-administer global zone, the same problem exists
today when configuring the zone.max-lwps resource control on the
global zone, or when all system swap is reserved. The zonecfg(1m)
enhancements detailed below will help administrators configure
zone.max-swap safely.
2. "swap" and "locked" properties for zonecfg(1m) "capped_memory"
resource.
[2] added a new 'capped-memory' resource to zonecfg. This resource
groups the properties used when capping memory for the zone. It
currently has the 'physical' property which specifies the physical
memory cap for the zone. We will add two new properties, 'swap'
and 'locked' to the "capped-memory" resource. These properties
will be added by using the rctl alias mechanism which is also
described in [2].
swap: An unsigned decimal number with a required k, m, g, or t
modifier. A value of '10m' means ten megabytes."
This will be used to configure the zone.max-swap
resource control, which limits swap consumed by
processes and tmpfs mounts within a zone.
locked: An unsigned decimal number with a required k, m, g, or t
modifier. A value of '10m' means ten megabytes."
This will be used to configure the
zone.max-locked-memory[3,4] resource control, which
limits locked physical memory (made non-pageable) by
processes within a zone.
To prevent administrators from configuring a low swap limit that
will prevent a system from booting, zonecfg will not allow a
swap limit to be configured to less than:
Global zone: 100M
Non-global zone: 50M.
These numbers are based on the swap needed to boota zone after a
default installation.
Also, if zone.max-swap is configured (via zonecfg(1m)) on the
global zone, a warning will be printed:
global:capped-memory> set swap=200M
Warning: Setting capped swap on the global zone can impact
system availability.
Similar warnings will be printed for setting other rctls on the
global zone which can affect availability, such as zone.max-lwps.
3. For observability of zone resource utilization and limits, this
case proposes the addition of following kstats:
INTERFACE COMMITMENT BINDING
caps:{zoneid}:swapresv_zone_{zoneid} Uncommitted Patch
caps:{zoneid}:lockedmem_zone_{zoneid} Uncommitted Patch
To observe project resource utilization, this case also proposes
the following kstat:
INTERFACE COMMITMENT BINDING
caps:{zoneid}:lockedmem_project_{projid} Uncommitted Patch
The projid cannot be used as the instance number, as each zone
has a unique project namespace. This means project 0 in the
global zone is different from project 0 in each non global zone.
The global zone will see kstats for all zones, while non global
zones will only see kstats with matching zoneid.
Each kstat will have the statistics:
usage: The current quantity of resource consumed.
value: The current enforced cap.
zonename: The name of the zone. A zone may change zoneid
each time it boots, so this statistic helps to
match the kstat to the zone.
These kstats can be consumed by higher level tools/scripts to
provide information about zone memory usage. Each kstats instance
number matches the zoneid of the zone it represents. Non-global
zones will only be able to read the kstat with matching zoneid.
The global zone will be able to read all kstats.
Additional kstats will be added in the future to report usage and
cap for other rctls. Addressing existing rctls is outside the
scope of this case.
4. prstat(1m) output changes to report swap reserved.
INTERFACE COMMITMENT BINDING
prstat(1m) output Uncommitted Patch
This case proposes changing the "SIZE" column of "prstat -Z" zone
output lines to "SWAP". The swap reported will be the total swap
consumed by the zone's processes and tmpfs mounts. This value
will assist administrators in monitoring the swap reserved by each
zone, allowing them to choose a reasonable "zone.max-swap"
settings.
The "SIZE" column will also be changed to "SWAP" for prstat
options a, T, and J, for users, tasks, and projects.
The current "SIZE" column arbitrarily sums the address spaces of
the processes in each zone. This sum include device mappings,
but does not include NORESERVE segments. This sum does not map
to real system resources, and therefore provides no meaningful
information when summed across all processes belonging to a zone,
project, task, or user.
For the default prstat process listing, "SIZE" will not be changed
to swap, as the virtual address space size for each process is a
useful number. Detailed per process memory consumption reporting
is outside the scope of this case, and would be better addressed
by a case proposing a solution for 6487372[7]:
RFE: prstat -x: Providing VSZ/RSS/ANON/LOCK Memory & CPU Usage
This RFE requests displaying detailed memory usage per process.
"SWAP" reservation certainly falls into this category.
REFERENCES:
[1] PSARC/2002/174 Virtualization and Namespace Isolation in Solaris
http://sac.sfbay.sun.com/PSARC/2002/174
http://www.opensolaris.org/os/community/arc/caselog/2002/174/
[2] PSARC/2006/496 Improved Zones/RM Integration
http://sac.sfbay.sun.com/PSARC/2006/496/
http://www.opensolaris.org/os/community/arc/caselog/2006/496/
[3] PSARC/2006/463 Amendment to zone/project.max-locked-memory Resource
Controls
http://sac.sfbay.sun.com/PSARC/2006/463/
http://www.opensolaris.org/os/community/arc/caselog/2006/463/
[4] PSARC/2004/580 zone/project.max-locked-memory Resource Controls
http://sac.sfbay.sun.com/PSARC/2004/580/
http://www.opensolaris.org/os/community/arc/caselog/2004/580/
[5] PSARC/2002/181 Swap Sets
http://sac.sfbay.sun.com/PSARC/2002/181/
http://www.opensolaris.org/os/community/arc/caselog/2002/181/
[6] 5103071 RFE: local zones can run the global zone out of swap
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=5103071
[7] RFE: prstat -x: Providing VSZ/RSS/ANON/LOCK Memory & CPU Usage
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6487372