Overall, these two pieces will be very welcome additions to the arsenal of
zone-related resource controls.
I have inserted some requests for clarification in-line.
Dan Price wrote:
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Swap resource control; locked memory RM improvements
Steve Lawrence <[EMAIL PROTECTED]>
SUMMARY:
This case enhances Solaris Zones[1] and builds upon recent work to
improve the integration between Zones and Solaris Resource
Management[2]. The case addresses an existing RFE[6], which requests
a mechanism to limit system swap reserved by a zone. The case also
proposes extensions to [2], which will make swap reservation and
locked memory resource controls easy to configure on a zone via
zonecfg(1m).
1. This case proposes adding the following resource control:
INTERFACE COMMITMENT BINDING
"zone.max-swap" Committed Patch
This control will limit the swap reserved by processes and tmpfs
mounts within the global zone and non-global zones. This resource
control serves to address the referenced RFE[6].
2. To simplify the configuration of memory-related resource controls
on zones, this case proposes adding the following properties to
zonecfg(1M):
INTERFACE COMMITMENT BINDING
"swap" zonecfg property Committed Patch
"locked" zonecfg property Committed Patch
These properties will be added to the zonecfg "capped-memory"
zonecfg resource introduced by [2].
3. For observability of zone resource utilization and limits, this
case proposes the addition of following kstats:
INTERFACE COMMITMENT BINDING
zone:{zoneid}:vm:zonename Uncommitted Patch
zone:{zoneid}:vm:swap_reserved Uncommitted Patch
zone:{zoneid}:vm:max_swap_reserved Uncommitted Patch
zone:{zoneid}:vm:locked_memory Uncommitted Patch
zone:{zoneid}:vm:max_locked_memory Uncommitted Patch
These kstats will be of class "misc". The global zone will see
kstats for all zones, while non global zones will only see kstats
with matching zoneid.
4. prstat(1m) output changes to report swap reserved.
INTERFACE COMMITMENT BINDING
prstat(1m) output Uncommitted Patch
This case proposes changing the "SIZE" column of "prstat -Z" zone
output lines to "SWAP". The swap reported will be the total swap
consumed by the zone's processes and tmpfs mounts. This value will
assist administrators in monitoring the swap reserved by each zone,
allowing them to choose a reasonable "zone.max-swap" settings.
The "SIZE" column will also be changed to "SWAP" for prstat
options a, T, and J, for users, tasks, and projects.
The reason for not changing this column in the default output would be helpful.
DETAIL:
1. "zone.max-swap" resource control.
Limits swap consumed by user process address space mappings and
tmpfs mounts within a zone.
Currently a global or non-global zone can consume all swap
resources available on the system, limiting the usefulness of zones
as an application container. zone.max-swap provides a mechanism to
I would rephrase that as "the container of an application" to avoid confusion with
the Solaris feature set called "Containers." I assume that the former was meant
moreso than the latter even though Containers are Solaris' implementation of "an
application container."
limit swap consumption per zone. This will protect other zones
from runaway memory leakers/consumers and/or tmpfs writers in a
zone with zone.max-swap configured.
Another solution to this problem would be a "swap set" [5] feature,
which would allow the reservation of swap devices into sets to
which zones could be bound. While "swap sets" would be useful,
zone.max-swap provides a simple solution which is easier to
administer, as it does not require the configuration of pools and
swap devices/files.
"zone.max-swap" is not incompatable with swap sets. In fact, a
future addition of swap sets could be used in combination with
zone.max-swap. For instance, several zones could be bound to the
same set of swap devices, each with it's own individual
zone.max-swap configured as a cap within that set. The
implementation of "zone.max-swap" is also much less risky to make
available via patch.
zone.max-swap will be configurable on both the global zone, and
non-global zones. The affect on processes in a zone reaching its
zone.max-swap limit is the same as if all system swap is reserved.
Callers of mmap(2) and sbrk(2) will receive EAGAIN. Writes to
tmpfs will return ENOSPC, which is the same errno returned when
a tmpfs mount reaches it's "size" mount option. The "size" mount
option limits the quantity of swap that a tmpfs mount can reserve.
With S10 11/06, some zone limitations are now configurable, e.g. setting the
system time clock. Similarly, the ability to modify a zone's swap limit could be
given to the zone's root user, which might be valuable in some situations. This
would be analogous to the 'basic' privilege level. It would allow an advisory
limit to be placed on a zone - a limit that the zone admin could modify in unusual
circumstances.
I realize that this opens a can of worms in that most rctls are protected by the
sys_res_config priv, which is not allowed in a zone even with 11/06. Further, it
makes sense to consistently allow or forbid rctl-modification in zones.
I just wanted to mention this idea so that it is not unintentionally overlooked.
While a low zone.max-swap setting for the global zone can lead to
a difficult-to-administer global zone, the same problem exists
today when configuring the zone.max-lwps resource control on the
global zone, or when all system swap is reserved.
2. "swap" and "locked" properties for zonecfg(1m) "capped_memory"
resource.
[2] added a new 'capped-memory' resource to zonecfg. This resource
groups the properties used when capping memory for the zone. It
currently has the 'physical' property which specifies the physical
memory cap for the zone. We will add two new properties, 'swap'
and 'locked' to the "capped-memory" resource. These properties
will be added by using the rctl alias mechanism which is also
described in [2].
swap: An unsigned decimal number with a required k, m, g, or t
modifier. A value of '10m' means ten megabytes."
This will be used to configure the zone.max-swap
resource control, which limits swap consumed by
processes and tmpfs mounts within a zone.
locked: An unsigned decimal number with a required k, m, g, or t
modifier. A value of '10m' means ten megabytes."
This will be used to configure the
zone.max-locked-memory[3,4] resource control, which
limits locked physical memory (made non-pageable) by
processes within a zone.
3. Swap and locked memory kstats for zones.
There is currently no way to observe how much locked memory or swap
a zone is consuming. This makes capacity planning and monitoring
difficult. To solve this problem, the following kstats are
proposed:
INTERFACE COMMITMENT BINDING
zone:{zoneid}:vm:zonename Uncommitted Patch
zone:{zoneid}:vm:swap_reserved Uncommitted Patch
zone:{zoneid}:vm:max_swap_reserved Uncommitted Patch
zone:{zoneid}:vm:locked_memory Uncommitted Patch
zone:{zoneid}:vm:max_locked_memory Uncommitted Patch
Higher level monitoring scripts/tools can be developed in the
future to consume these kstats, or a future version of these
kstats.
STATISTIC DESCRIPTION
zonename The name of the zone with {zoneid}
swap reserved: swap reserved by zone in bytes.
Does swap_reserved include pages shared with other zones, e.g. text pages?
max_swap_reserved: current zone.max-swap limit in bytes,
locked_memory: physical memory locked by zone in bytes.
Does locked_memory include pages shared with other zones, e.g. text pages?
max_locked_memory: current zone.max-locked-memory limit in
bytes.
These kstats can be consumed by higher level tools/scripts to
provide information about zone memory usage. Each kstats instance
number matches the zoneid of the zone it represents. Non-global
zones will only be able to read the kstat with matching zoneid The
global zone will be able to read all kstats.
4. prstat(1m) output changes to report swap reserved.
INTERFACE COMMITMENT BINDING
prstat(1m) output Uncommitted Patch
This case proposes changing the "SIZE" column of "prstat -Z" zone
output lines to "SWAP". The swap reported will be the total swap
consumed by the zone's processes and tmpfs mounts. This value
will assist administrators in monitoring the swap reserved by each
zone, allowing them to choose a reasonable "zone.max-swap"
settings.
The "SIZE" column will also be changed to "SWAP" for prstat
options a, T, and J, for users, tasks, and projects.
The current "SIZE" column arbitrarily sums the address spaces of
the processes in each zone. This sum include device mappings,
but does not include NORESERVE segments. This sum does not map
to any real system resource, and therefore provides no meaningful
information.
REFERENCES:
[1] PSARC/2002/174 Virtualization and Namespace Isolation in Solaris
http://sac.sfbay.sun.com/PSARC/2002/174
http://www.opensolaris.org/os/community/arc/caselog/2002/174/
[2] PSARC/2006/496 Improved Zones/RM Integration
http://sac.sfbay.sun.com/PSARC/2006/496/
http://www.opensolaris.org/os/community/arc/caselog/2006/496/
[3] PSARC/2006/463 Amendment to zone/project.max-locked-memory Resource
Controls
http://sac.sfbay.sun.com/PSARC/2006/463/
http://www.opensolaris.org/os/community/arc/caselog/2006/463/
[4] PSARC/2004/580 zone/project.max-locked-memory Resource Controls
http://sac.sfbay.sun.com/PSARC/2004/580/
http://www.opensolaris.org/os/community/arc/caselog/2004/580/
[5] PSARC/2002/181 Swap Sets
http://sac.sfbay.sun.com/PSARC/2002/181/
http://www.opensolaris.org/os/community/arc/caselog/2002/181/
[6] 5103071 RFE: local zones can run the global zone out of swap
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=5103071
--------------------------------------------------------------------------
Jeff VICTOR Sun Microsystems jeff.victor @ sun.com
OS Ambassador Sr. Technical Specialist
Solaris 10 Zones FAQ: http://www.opensolaris.org/os/community/zones/faq
--------------------------------------------------------------------------
_______________________________________________
zones-discuss mailing list
[email protected]