On Mon, Nov 10, 2008 at 9:07 AM, Jeff Victor <[EMAIL PROTECTED]> wrote:
> On Mon, Nov 10, 2008 at 12:30 AM, Mike Gerdts <[EMAIL PROTECTED]> wrote:
>> FWIW, I tend to use the term "reserved memory" instead of "swap"
>> because that is less confusing to most people.
>
> That's a useful perspective. If you choose the swap cap - which is
> really a VM cap - so that the sum of the swap caps is less than RAM,
> you have effectively implemented 'reserved memory.' (I'm ignoring RAM
> usage of the global zone, which shouldn't be ignored in practice.)
> But you must be careful: nothing prevents you from 'over-reserving'
> memory. If you have 'reserved' all of system memory in this way, and
> add a new zone with its own 'reserve,' you will have over-subscribed
> memory. That might be a good thing, as long as no one is surprised if
> the system starts paging.

Very true.  Same goes for FSS and CPU caps.  There are plenty of ways
to shoot yourself in the foot.  :)

> However, the entire concept of reserved memory limits the scalability
> of the system. Imagine 4 zones with swap caps of 4GB, on a system with
> 16GB of RAM. (Again, I'm ignoring the GZ.) Unless you allow yourself
> to over-subscribe RAM, you can't add more zones, even if those 4 zones
> are only using 1GB each during normal conditions.

I'm actually looking forward to SSD to minimize the impact.  Paging to
it should give a relatively small amount of performance impact while
giving some useful stats to let the administrator know that there is
an upcoming capacity problem.  If I pretend that a 64 GB SSD drive
will cost about $1K and I should mirror them, that gives me the
ability to have a "fast swap device" for about $31 / GB.  If swap is
on zfs with zil + l2arc, cost may be even less.  Assuming performance
meets expectations, this is pretty cheap peace of mind.  My only
performance expectation is that the workloads that aren't paging but
may be doing some I/O have response time similar to what they would
when the rogue app starts paging.  With traditional drives, seek times
punish everyone.

> Balance is needed. When paging must be avoided at all costs,
> 'reserving' memory by setting proper swap-caps makes a great deal of
> sense. When paging is unlikely because the workload is well
> understood, and a small amount of paging would not be horrible, and
> zone 'density' is important, reserving memory would not make sense.
> Many situations would call for memory 'reservations' on some zones,
> and RAM caps on others.

The tricky part is that you have to set the swap caps on all zones to
reserve the memory for those that you really want to protect.

>> For CPU related stats, take a look at a discussion I started a while back:
>>
>> http://mail.opensolaris.org/pipermail/perf-discuss/2005-November/002048.html
>
> Cool. Also, Jim Fiori had a simple idea for counting CPU time per zone
> with almost no perf impact: use DTrace to implement a probe which
> fires every M microseconds, and increments a per-zone counter. But
> that's a short-term solution. We need a per-zone counter in the kernel
> that tallies CPU time per zone.
>
>> One project I would like to kick off sometime is doing per user, per
>> project, and per zone microstate accounting.
>
> Excellent idea. I'll watch for it! :-)
>
> <snip>
>
>> I didn't have a chance to check logic closely or run it on a test
>> system.  I'll offer more feedback if needed when I get a chance to
>> test it.  It is a great start and I can't wait to see it progress.
>
> Thanks!

   224  open (RELEASE, "/etc/release");
   225  $rel=<RELEASE>;
   226  close RELEASE;
   227  if ($rel =~ /3\/05/) { $update=1; }
   228  if ($rel =~ /6\/06/) { $update=2; }
   229  if ($rel =~ /11\/06/) { $update=3; }

3/05 == GA (update 0?)
1/06 == U1

   492  # Note that swap(1M) doesn't report memory pages that the
kernel has locked.
   493  open (SWAP, "/usr/sbin/swap -sh|");
   494    while (<SWAP>) {

The -h option to swap doesn't exist in S10.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
zones-discuss mailing list
zones-discuss@opensolaris.org

Reply via email to