I am pleased to sponsor the following case for Steve Lawrence. This case improves memory resource management and further integrates memory RM with Zones. The interfaces are Commited (for the rctls and zonecfg(1m) properties) and Uncommitted (for the kstats and changes to the output of prstat(1m)). Patch release binding is requested. The timer is set for 11/01/2006.
Please note that the Zones team plans to run this and future cases "in the open." Both Steve Lawrence and [email protected] should be CC'd. For members of zones-discuss who are wondering what the heck this mail is about, please see http://www.opensolaris.org/jive/thread.jspa?threadID=15976 before replying to this message. Thanks, -dp - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Swap resource control; locked memory RM improvements Steve Lawrence <[EMAIL PROTECTED]> SUMMARY: This case enhances Solaris Zones[1] and builds upon recent work to improve the integration between Zones and Solaris Resource Management[2]. The case addresses an existing RFE[6], which requests a mechanism to limit system swap reserved by a zone. The case also proposes extensions to [2], which will make swap reservation and locked memory resource controls easy to configure on a zone via zonecfg(1m). 1. This case proposes adding the following resource control: INTERFACE COMMITMENT BINDING "zone.max-swap" Committed Patch This control will limit the swap reserved by processes and tmpfs mounts within the global zone and non-global zones. This resource control serves to address the referenced RFE[6]. 2. To simplify the configuration of memory-related resource controls on zones, this case proposes adding the following properties to zonecfg(1M): INTERFACE COMMITMENT BINDING "swap" zonecfg property Committed Patch "locked" zonecfg property Committed Patch These properties will be added to the zonecfg "capped-memory" zonecfg resource introduced by [2]. 3. For observability of zone resource utilization and limits, this case proposes the addition of following kstats: INTERFACE COMMITMENT BINDING zone:{zoneid}:vm:zonename Uncommitted Patch zone:{zoneid}:vm:swap_reserved Uncommitted Patch zone:{zoneid}:vm:max_swap_reserved Uncommitted Patch zone:{zoneid}:vm:locked_memory Uncommitted Patch zone:{zoneid}:vm:max_locked_memory Uncommitted Patch These kstats will be of class "misc". The global zone will see kstats for all zones, while non global zones will only see kstats with matching zoneid. 4. prstat(1m) output changes to report swap reserved. INTERFACE COMMITMENT BINDING prstat(1m) output Uncommitted Patch This case proposes changing the "SIZE" column of "prstat -Z" zone output lines to "SWAP". The swap reported will be the total swap consumed by the zone's processes and tmpfs mounts. This value will assist administrators in monitoring the swap reserved by each zone, allowing them to choose a reasonable "zone.max-swap" settings. The "SIZE" column will also be changed to "SWAP" for prstat options a, T, and J, for users, tasks, and projects. DETAIL: 1. "zone.max-swap" resource control. Limits swap consumed by user process address space mappings and tmpfs mounts within a zone. Currently a global or non-global zone can consume all swap resources available on the system, limiting the usefulness of zones as an application container. zone.max-swap provides a mechanism to limit swap consumption per zone. This will protect other zones from runaway memory leakers/consumers and/or tmpfs writers in a zone with zone.max-swap configured. Another solution to this problem would be a "swap set" [5] feature, which would allow the reservation of swap devices into sets to which zones could be bound. While "swap sets" would be useful, zone.max-swap provides a simple solution which is easier to administer, as it does not require the configuration of pools and swap devices/files. "zone.max-swap" is not incompatable with swap sets. In fact, a future addition of swap sets could be used in combination with zone.max-swap. For instance, several zones could be bound to the same set of swap devices, each with it's own individual zone.max-swap configured as a cap within that set. The implementation of "zone.max-swap" is also much less risky to make available via patch. zone.max-swap will be configurable on both the global zone, and non-global zones. The affect on processes in a zone reaching its zone.max-swap limit is the same as if all system swap is reserved. Callers of mmap(2) and sbrk(2) will receive EAGAIN. Writes to tmpfs will return ENOSPC, which is the same errno returned when a tmpfs mount reaches it's "size" mount option. The "size" mount option limits the quantity of swap that a tmpfs mount can reserve. While a low zone.max-swap setting for the global zone can lead to a difficult-to-administer global zone, the same problem exists today when configuring the zone.max-lwps resource control on the global zone, or when all system swap is reserved. 2. "swap" and "locked" properties for zonecfg(1m) "capped_memory" resource. [2] added a new 'capped-memory' resource to zonecfg. This resource groups the properties used when capping memory for the zone. It currently has the 'physical' property which specifies the physical memory cap for the zone. We will add two new properties, 'swap' and 'locked' to the "capped-memory" resource. These properties will be added by using the rctl alias mechanism which is also described in [2]. swap: An unsigned decimal number with a required k, m, g, or t modifier. A value of '10m' means ten megabytes." This will be used to configure the zone.max-swap resource control, which limits swap consumed by processes and tmpfs mounts within a zone. locked: An unsigned decimal number with a required k, m, g, or t modifier. A value of '10m' means ten megabytes." This will be used to configure the zone.max-locked-memory[3,4] resource control, which limits locked physical memory (made non-pageable) by processes within a zone. 3. Swap and locked memory kstats for zones. There is currently no way to observe how much locked memory or swap a zone is consuming. This makes capacity planning and monitoring difficult. To solve this problem, the following kstats are proposed: INTERFACE COMMITMENT BINDING zone:{zoneid}:vm:zonename Uncommitted Patch zone:{zoneid}:vm:swap_reserved Uncommitted Patch zone:{zoneid}:vm:max_swap_reserved Uncommitted Patch zone:{zoneid}:vm:locked_memory Uncommitted Patch zone:{zoneid}:vm:max_locked_memory Uncommitted Patch Higher level monitoring scripts/tools can be developed in the future to consume these kstats, or a future version of these kstats. STATISTIC DESCRIPTION zonename The name of the zone with {zoneid} swap reserved: swap reserved by zone in bytes. max_swap_reserved: current zone.max-swap limit in bytes, locked_memory: physical memory locked by zone in bytes. max_locked_memory: current zone.max-locked-memory limit in bytes. These kstats can be consumed by higher level tools/scripts to provide information about zone memory usage. Each kstats instance number matches the zoneid of the zone it represents. Non-global zones will only be able to read the kstat with matching zoneid The global zone will be able to read all kstats. 4. prstat(1m) output changes to report swap reserved. INTERFACE COMMITMENT BINDING prstat(1m) output Uncommitted Patch This case proposes changing the "SIZE" column of "prstat -Z" zone output lines to "SWAP". The swap reported will be the total swap consumed by the zone's processes and tmpfs mounts. This value will assist administrators in monitoring the swap reserved by each zone, allowing them to choose a reasonable "zone.max-swap" settings. The "SIZE" column will also be changed to "SWAP" for prstat options a, T, and J, for users, tasks, and projects. The current "SIZE" column arbitrarily sums the address spaces of the processes in each zone. This sum include device mappings, but does not include NORESERVE segments. This sum does not map to any real system resource, and therefore provides no meaningful information. REFERENCES: [1] PSARC/2002/174 Virtualization and Namespace Isolation in Solaris http://sac.sfbay.sun.com/PSARC/2002/174 http://www.opensolaris.org/os/community/arc/caselog/2002/174/ [2] PSARC/2006/496 Improved Zones/RM Integration http://sac.sfbay.sun.com/PSARC/2006/496/ http://www.opensolaris.org/os/community/arc/caselog/2006/496/ [3] PSARC/2006/463 Amendment to zone/project.max-locked-memory Resource Controls http://sac.sfbay.sun.com/PSARC/2006/463/ http://www.opensolaris.org/os/community/arc/caselog/2006/463/ [4] PSARC/2004/580 zone/project.max-locked-memory Resource Controls http://sac.sfbay.sun.com/PSARC/2004/580/ http://www.opensolaris.org/os/community/arc/caselog/2004/580/ [5] PSARC/2002/181 Swap Sets http://sac.sfbay.sun.com/PSARC/2002/181/ http://www.opensolaris.org/os/community/arc/caselog/2002/181/ [6] 5103071 RFE: local zones can run the global zone out of swap http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=5103071 -- Daniel Price - Solaris Kernel Engineering - [EMAIL PROTECTED] - blogs.sun.com/dp _______________________________________________ zones-discuss mailing list [email protected]
