Jerry Jelinek wrote:
Dan Price wrote:
On Thu 10 May 2007 at 04:21PM, Jerry Jelinek wrote:
of the other controls is trickier although I think Dan's idea of scaling
these based on the system makes it easier. We might also want to think
about scaling based on the number of running zones.
Another way to look at it (and I think what you are saying) would be to
broaden the notion of "shares" a bit to include more of the system
resources-- for example, memory. What's tough there, though, is that
our notion of shares today represent an entitlement, and the case of
memory, we're talking about a cap on utilization.
I think fundamentally we hear from two camps: those who want to
proportionally partition whatever resources are available, and those who
want to see the system as "virtual 512MB Ultra-2's" or "virtual 1GB,
Yes, something like shares for memory would be nice because you don't
have to know ahead of time what your maximum will be and as long as
the system is not overcommitted you can use what you need.
I was thinking down the same path - I was calling it Fair Memory Shares - but
we must advise caution in usage of such a feature. CPU cycles are infinite,
memory is not.
When using zones with default cpu-shares and at least a few zones, adding
another zone will decrease the minimum per-zone CPU utilization only slightly.
Regardless of RM usage, booting an additional zone may increase RAM usage
sufficiently to cause enough paging that performance of all apps will suffer
noticeably. Using FMS won't help that, and we shouldn't let users think that
Another difference between CPU and memory RM is that workloads need a minimum
amount of VM to operate at all, a minimum number of threads to operate, and a
minimum amount of RAM to operate with acceptable performance. Those are all
very different from CPU mgmt.
With all of that, should default values be minima or maxima? The goal I have
in mind is default values that will protect a zone from DoS attacks, or the
equivalent symptom, caused by bad software.
Although we could assign default values to caps, they would be arbitrary, and
would need to be so large that they would often be largely ineffective. On the
other hand, they would be easy to achieve. Even *I* could implement them... ;-)
Perhaps it would be more effective to have default minima, similar to the
default of 1 FSS share that the global zone has had all along. Starting with
Mads' Enable-RM knob, we could probably assign reasonable default minima, maybe:
* RAM: 200MB? 500MB?
* VM: 200MB? 500MB?
* lwps: 1000
The last one tripped up the Clarkson team - they started with max-lwps=50, and
then didn't understand why the zone wouldn't boot. I gave them a couple of
pointers, and suggested 175, which worked for them. There's little sense in
setting that to a low number - Solaris will create more than 70,000 threads
before it begins to run out of a resource. What we really want is an
lwp-creation-rate meter. But max-lwps is much simpler, and still effective.
Another idea: when you Enable-RM, suggest a default value and ask for
confirmation or a different value. A wizard would be very useful here,
because it could do a quick analysis of the current system and its workloads,
and in some cases be *very* helpful in telling the user "if you boot this zone
in addition to the currently running zones, performance degradation will
result." That's the polite version of "excuse me, but you can't stuff 10
kilos of dirt in a 5-kilo bag."
Back to default minima vs. default maxima: I doubt that implementing default
minima would be simple, and my hope is to get something in place before the
end of 2008, perhaps as early as S10 U6. Although simple default caps for
existing controls could be ready for U5, if allowed...
Apologies for all the rambling. More below.
I agree that there are multiple ways people want to slice things up and
we are actually pretty good with the capped and dedicated stuff now.
It is the full sharing with a guaranteed minimum that we might want
to think about improving (for memory). I'm not sure how hard that
will be though.
Just thinking out loud here, I wonder if there is any way we could come
close to this behavior by dynamically adjusting the physical and swap
controls we already have, based upon how many zones are running?
This carries the danger of "pulling the rug out from under the zone."
Well-behaving zone one minute, waste of memory the next. :-)
It wouldn't be as good as fair shares for memory but it would be a lot easier
to implement. Or, maybe we could use rcapd to watch the dynamic
behavior of each zone and adjust the physical cap as needed. That would be really
easy to implement.
I like dynamically tunable systems - if the heuristics are good enough. The
'importance' value in resource pools is simple but works. Something similar
could be used. I need to think about that one...
It is harder for the swap cap since I don't think we
can force a zone down to a lower level once it is over and we need it to be
at a lower limit.
Yes, a retroactive ENOMEM would be Bad. :-) Sounds like AIX. ;-)
Jeff VICTOR Sun Microsystems jeff.victor @ sun.com
OS Ambassador Sr. Technical Specialist
Solaris 10 Zones FAQ: http://www.opensolaris.org/os/community/zones/faq
zones-discuss mailing list