Re: [zones-discuss] improved zones/RM integration
Mike, Mike Gerdts wrote: On 6/26/06, Gerald A. Jelinek [EMAIL PROTECTED] wrote: Attached is a description of a project we have been refining for a while now. The idea is to improve the integration of zones with some of the existing resource management features in Solaris. In the proposal you say: As part of this overall project, we will be enhancing the internal rcapd rss accounting so that rcapd will have a more accurate measurement of the overall rss for each zone. Does this spill over to prstat such that there may finally be a fix for: 4754856 prstat -atJTZ should count shared segments only once Yes, we are addressing this bug as part of this work. prstat will be able to report an accurate rss number for processes, users, projects and tasks as well as zones. prstat and rcapd will use the same, new underlying rss counting code we have developed. Jerry ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] improved zones/RM integration
Jerry Jelinek wrote: Mike, Mike Gerdts wrote: In the proposal you say: As part of this overall project, we will be enhancing the internal rcapd rss accounting so that rcapd will have a more accurate measurement of the overall rss for each zone. Does this spill over to prstat such that there may finally be a fix for: 4754856 prstat -atJTZ should count shared segments only once Yes, we are addressing this bug as part of this work. prstat will be able to report an accurate rss number for processes, users, projects and tasks as well as zones. prstat and rcapd will use the same, new underlying rss counting code we have developed. Just curious: which process(es) gets billed for shared text pages? -- Jeff VICTOR Sun Microsystemsjeff.victor @ sun.com OS AmbassadorSr. Technical Specialist Solaris 10 Zones FAQ:http://www.opensolaris.org/os/community/zones/faq -- ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] improved zones/RM integration
Could we somehow work the zone name into this? It would be nice for e.g. poolstat(1) observability. Otherwise the user experience is going to be all about trying to work out what 'SUNWzone34' maps to, which seems poor. We need to have the name begin with SUNW or we could have collisions with existing pools. I supposed instead of zone{id}, it could be SUNW{zonename} although you lose the visibility that the pool is associated with a zone. Maybe SUNWzone_{zonename}? Or perhaps SUNWtemp_{zonename}? dsc ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] improved zones/RM integration
These are very exciting features !! Some comments. If a dedicated-cpu (or eventually a dedicated-memory) resource is configured for the zone, then when the zone boots zoneadmd will create a temporary pool dedicated for the zones use. The temp pool created is going to show up in pooladm output, right ? If someone uses such a pool in a zone configuration done in traditional style. ( set pool=SUNWzone{zoneid} ) is zonecfg going to reject it ? If not, it won't be dedicated anymore. Also, is there something that's going to stop poolbind that may move zones to and from temp pools to permanent pools ? What if a zone was created as a result of which a temp pool was created, and poolbind moves it to a permanent pool ? The temp pool is deleted ? Resource controls in zonecfg will be simplified. Do you think we need prctl enhancements that will allow setting rctls using the new aliases directly ? that would be good for consistency. Also you mention that the backword compatibility ensures the existing tools that parse zonecfg info / export output are unaffected. It's true to some extend. But some of them treat unknown resources as nop and display them as is. Which means some tools will show the new resources as unknown which is harmless but could be confusing sometimes. May be having a command line option such as zonecfg info -legacy that would suppress the new resources could help. We will enhance zoneadmd so that if the zone.cpu-shares rctl is set and FSS is not already the default scheduling class, zoneadmd will set the scheduling class to be FSS for processes in the zone. On the fly ? i.e. If a zone didn't have zone.cpu-shares when it was booted and someone did prctl to set it, zoneadmd will change the sched class of all processes in the zone to FSS ? That's cool. On the other hand, How about when this zone is sharing a pool with another zone without cpu-shares and the pool's sched class is also not FSS ? We don't recommend processes running in different scheduling class share CPUs right ? This feature may take it to that situation. thanks - Amol Gerald A. Jelinek wrote: Attached is a description of a project we have been refining for a while now. The idea is to improve the integration of zones with some of the existing resource management features in Solaris. I would appreciate hearing any suggestions or questions. I'd like to submit this proposal to our internal architectural review process by mid-July. I have also posted a few slides that give an overview of the project. Those are available on the zones files page (http://www.opensolaris.org/os/community/zones/files/). Thanks, Jerry This message posted from opensolaris.org SUMMARY: This project enhances Solaris zones[1], pools[2-4] and resource caps[5,6] to improve the integration of zones with resource management (RM). It addresses existing RFEs[7-10] in this area and lays the groundwork for simplified, coherent management of the various RM features exposed through zones. We will integrate some basic pool configuration with zones, implement the concept of temporary pools that are dynamically created/destroyed when a zone boots/halts and we will simplify the setting of resource controls within zonecfg. We will enhance rcapd so that it can cap a zone's memory while rcapd is running in the global zone. We will also make a few other changes to provide a better overall experience when using zones with RM. Patch binding is requested for these new interfaces and the stability of most of these interfaces is evolving (see interface table for complete list). PROBLEM: Although zones are fairly easy to configure and install, it appears that many customers have difficulty setting up a good RM configuration to accompany their zone configuration. Understanding RM involves many new terms and concepts along with lots of documentation to understand. This leads to the problem that many customers either do not configure RM with their zones, or configure it incorrectly, leading them to be disappointed when zones, by themselves, do not provide all of the containment that they expect. This problem will just get worse in the near future with the additional RM features that are coming, such as cpu-caps[11], memory sets[12] and swap sets[13]. PROPOSAL: There are 7 different enhancements outlined below. 1) Hard vs. Soft RM configuration within zonecfg We will enhance zonecfg(1M) so that the user can configure basic RM capabilities in a structured way. The various existing and upcoming RM features can be broken down into hard vs. soft partitioning of
Re: [zones-discuss] improved zones/RM integration
Jeff, Thanks for your comments. I have a few responses in-line. Jeff Victor wrote: 1) General comment: I agree that this will provide needed clarity to the seemingly unorganized RM features that we have scattered through Solaris during the last decade. The automation of certain activities (e.g. starting rcapd in the GZ when needed) will also be extremely beneficial. 2) Terminology: Although your use of the phrase hard partition should be clear to most people, through experience with other partitioning technologies, the use of soft partition is less clear. To many people, a soft limit is one that can be exceeded occasionally or in certain situations, or is merely an advisory limit. It also conflicts with SVM soft partitions. We started using hard and soft to describe the general idea amongst ourselves when we first started talking about this but we were never very satisfied with those terms either. These two terms will not necessarily be used in the final documentation and are not used in the resource names themselves. Naming always seems to be a difficult area. The key technical issue to focus on is the resource names and properties being proposed as opposed to the overall words we are using to describe the general ideas. I am guessing we were able to communicate the general idea using hard and soft so I think we are ok there. We will have to figure out the best way to document this when the time comes. It is hard to find good terms that are not already used by other parts of the system. The word resource is a good example and is probably more confusing than soft partition. 3) Lwps context: Why is the lwps alias defined in the context of dedicated-cpu? Lwps seem to be unrelated to hard partitions. Further, ncpus represents a subset of a finite resource. Lwps are not finite. The two should be separate. Being able to set max-lwps is a useful limit for both processor sets and cpu-caps which is why it is available in both resources. The global zone still manages all processes, even when you are using a processor set, so a fork bomb can still effect the responsiveness of the system as a whole. However, max-lwps is optional in both the dedicated-cpu and capped-cpu resources. I should have made that clearer. I will update the document to clarify that. 4) Aliases: The notion of aliases also creates redundant output which could be confusing. I like the simplification of aliases as described, but I wish I had a good solution that wouldn't break existing tools that parse zonecfg info output - if such tools exist. This is part of the proposal which we struggled with a lot. In the end, we decided that we needed to maintain compatibility so that we did not break scripts that talked to the CLI directly. This was the compromise we came up with that allows us to do that. 5) Another RM setting: While we're integrating RM settings, I think we should consider adding project.max-address-space using the same semantics as {project,zone}.max-lwps. A zone-specific setting, zone.max-address-space, could be added along with a zonecfg alias, max-address-space. This would allow the GZ admin to cap the virtual memory space available to that zone. This would not take the place of swap sets, but would be valuable if this RM integration work might be complete before swap sets. We are looking at additional rctls, they are just not part of this project. Thanks again for your comments, Jerry ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] improved zones/RM integration
1) General comment: I agree that this will provide needed clarity to the seemingly unorganized RM features that we have scattered through Solaris during the last decade. The automation of certain activities (e.g. starting rcapd in the GZ when needed) will also be extremely beneficial. 2) Terminology: Although your use of the phrase hard partition should be clear to most people, through experience with other partitioning technologies, the use of soft partition is less clear. To many people, a soft limit is one that can be exceeded occasionally or in certain situations, or is merely an advisory limit. It also conflicts with SVM soft partitions. I suggest the phrases dedicated partition (or private partition) and shared partition instead to clarify the intent. Choosing dedicated partition might then require re-naming dedicated-cpu to guaranteed-cpu or private-cpu and re-naming dedicated-memory to guaranteed-memory or private memory. Or we could leave hard partition alone and simply change soft partition to shared partition. 3) Lwps context: Why is the lwps alias defined in the context of dedicated-cpu? Lwps seem to be unrelated to hard partitions. Further, ncpus represents a subset of a finite resource. Lwps are not finite. The two should be separate. 4) Aliases: The notion of aliases also creates redundant output which could be confusing. I like the simplification of aliases as described, but I wish I had a good solution that wouldn't break existing tools that parse zonecfg info output - if such tools exist. 5) Another RM setting: While we're integrating RM settings, I think we should consider adding project.max-address-space using the same semantics as {project,zone}.max-lwps. A zone-specific setting, zone.max-address-space, could be added along with a zonecfg alias, max-address-space. This would allow the GZ admin to cap the virtual memory space available to that zone. This would not take the place of swap sets, but would be valuable if this RM integration work might be complete before swap sets. 6) From your Project Alternatives slide: Should we have yet another standalone GUI? Absolutely not. Unless it was a wizard - IMO Solaris adoption would accelerate greatly if it had wizards to aid potential adopters. However this ends up looking, I look forward to seeing this integration! Gerald A. Jelinek wrote: Attached is a description of a project we have been refining for a while now. The idea is to improve the integration of zones with some of the existing resource management features in Solaris. I would appreciate hearing any suggestions or questions. I'd like to submit this proposal to our internal architectural review process by mid-July. I have also posted a few slides that give an overview of the project. Those are available on the zones files page (http://www.opensolaris.org/os/community/zones/files/). Thanks, Jerry This message posted from opensolaris.org SUMMARY: This project enhances Solaris zones[1], pools[2-4] and resource caps[5,6] to improve the integration of zones with resource management (RM). It addresses existing RFEs[7-10] in this area and lays the groundwork for simplified, coherent management of the various RM features exposed through zones. We will integrate some basic pool configuration with zones, implement the concept of temporary pools that are dynamically created/destroyed when a zone boots/halts and we will simplify the setting of resource controls within zonecfg. We will enhance rcapd so that it can cap a zone's memory while rcapd is running in the global zone. We will also make a few other changes to provide a better overall experience when using zones with RM. Patch binding is requested for these new interfaces and the stability of most of these interfaces is evolving (see interface table for complete list). PROBLEM: Although zones are fairly easy to configure and install, it appears that many customers have difficulty setting up a good RM configuration to accompany their zone configuration. Understanding RM involves many new terms and concepts along with lots of documentation to understand. This leads to the problem that many customers either do not configure RM with their zones, or configure it incorrectly, leading them to be disappointed when zones, by themselves, do not provide all of the containment that they expect. This problem will just get worse in the near future with the additional RM features that are coming, such as cpu-caps[11], memory sets[12] and swap sets[13]. PROPOSAL: There are 7 different enhancements outlined below. 1) Hard vs. Soft RM configuration within zonecfg We will enhance