Re: [zones-discuss] PSARC/2006/598 Swap resource control; locked memory RM improvements

2006-10-26 Thread Steve Lawrence
Comments inline.  I've snipped stuff not relevant to comments.

   4. prstat(1m) output changes to report swap reserved.
 
  INTERFACE   COMMITMENT  BINDING
  prstat(1m) output   Uncommitted   Patch
 
  This case proposes changing the SIZE column of prstat -Z zone
  output lines to SWAP.  The swap reported will be the total swap
  consumed by the zone's processes and tmpfs mounts.  This value will
  assist administrators in monitoring the swap reserved by each zone,
  allowing them to choose a reasonable zone.max-swap settings.
 
  The SIZE column will also be changed to SWAP for prstat
  options a, T, and J, for users, tasks, and projects.
 
 The reason for not changing this column in the default output would be 
 helpful.

I have a seperate private interface used by prstat(1m) to get aggregate swap
reserved by users, tasks, projects, and zones.  Default prstat output is
per-process, and the information is accessed via /proc.

Currently, per-process, or per-address-space, swap reservation is not
counted or made available via /proc.  From proc(4):

 typedef struct psinfo {
...
size_t pr_size;   /* size of process image in Kbytes */
...

size of process image is pretty meaningless.  If we can change pr_size to
be swap reserved by process, then we could change SIZE to SWAP for all
prstat(1m) output.  Would such a change to psinfo_t be reasonable?

  Currently a global or non-global zone can consume all swap
  resources available on the system, limiting the usefulness of zones
  as an application container.  zone.max-swap provides a mechanism to
 
 I would rephrase that as the container of an application to avoid 
 confusion with the Solaris feature set called Containers.  I assume that 
 the former was meant moreso than the latter even though Containers are 
 Solaris' implementation of an application container.

I'm not sure what you mean, but ok.  By the Solaris feature set called
Containers., do you mean zones + RM, or do you mean zones, xen, ldoms.

  zone.max-swap will be configurable on both the global zone, and
  non-global zones.  The affect on processes in a zone reaching its
  zone.max-swap limit is the same as if all system swap is reserved.
  Callers of mmap(2) and sbrk(2) will receive EAGAIN.  Writes to
  tmpfs will return ENOSPC, which is the same errno returned when
  a tmpfs mount reaches it's size mount option.  The size mount
  option limits the quantity of swap that a tmpfs mount can reserve.
 
 With S10 11/06, some zone limitations are now configurable, e.g. setting 
 the system time clock.  Similarly, the ability to modify a zone's swap 
 limit could be given to the zone's root user, which might be valuable in 
 some situations.  This would be analogous to the 'basic' privilege level.  
 It would allow an advisory limit to be placed on a zone - a limit that the 
 zone admin could modify in unusual circumstances.
 
 I realize that this opens a can of worms in that most rctls are protected 
 by the sys_res_config priv, which is not allowed in a zone even with 11/06. 
 Further, it makes sense to consistently allow or forbid rctl-modification 
 in zones.
 
 I just wanted to mention this idea so that it is not unintentionally 
 overlooked.

Currently, all zone.* rctls are not modifiable from a non global zone.

The established mechanism for a zone admin to set rctls within the
zone is via project.* rctls set on projects within the zone.  Granted, in
the zone.max-swap case, we are not proposing a project.max-swap, due to
implementation complexity and risk.  With sufficient customer damand, we could
investigate implementing project.max-swap in the future.

Currently no zone.* rctls allow basic rctl values to be set.  The only
project.* rctl which allows basic is project.max-contracts, and perhaps
that is a bug.  A basic rctl is an unprivileged rctl that only affects the
process within the task, project, or zone which sets it.  It is pretty
useless, except for process.* rctls.

I'd be happy to address the general issues of privilege related to project
and zone rctls as a seperate case.  A possible solution may be to redefine
basic for project and zone rctls, and/or introduce more fine grained
privileges.  I agree that work is needed here.

  STATISTIC   DESCRIPTION
  zonenameThe name of the zone with {zoneid}
  swap reserved:  swap reserved by zone in bytes.
 
 Does swap_reserved include pages shared with other zones, e.g. text pages?

Each process mapping text reserves unique swap for that mapping.  Even though
the underlying physical page may be shared between processes/zones, each
process needs it's own swap reservation.  This is because each process may
cow the page, and then may need to page the private copy to disk.

 
  max_swap_reserved:  current zone.max-swap limit 

Re: [zones-discuss] PSARC/2006/598 Swap resource control; locked memory RM improvements

2006-10-26 Thread Dan Price
On Thu 26 Oct 2006 at 11:50AM, Steve Lawrence wrote:
 size of process image is pretty meaningless.  If we can change pr_size to
 be swap reserved by process, then we could change SIZE to SWAP for all
 prstat(1m) output.  Would such a change to psinfo_t be reasonable?

You'd have to check in with Roger, I think (and doing so would probably
be worth doing anyway).  Adding a new field might be feasible.

   Currently a global or non-global zone can consume all swap
   resources available on the system, limiting the usefulness of zones
   as an application container.  zone.max-swap provides a mechanism to
 
  I would rephrase that as the container of an application to avoid
  confusion with the Solaris feature set called Containers.  I assume that
  the former was meant moreso than the latter even though Containers are
  Solaris' implementation of an application container.

 I'm not sure what you mean, but ok.  By the Solaris feature set called
 Containers., do you mean zones + RM, or do you mean zones, xen, ldoms.

Steve, I think the text is fine.  This document isn't intended for
consumption by customers, and the text is clear enough to anyone trying to
absorb its meaning.

  Similarly, the ability to modify a zone's swap
  limit could be given to the zone's root user, which might be valuable in
  some situations.  This would be analogous to the 'basic' privilege level.
  It would allow an advisory limit to be placed on a zone - a limit that the
  zone admin could modify in unusual circumstances.
 
  I just wanted to mention this idea so that it is not unintentionally
  overlooked.

 Currently, all zone.* rctls are not modifiable from a non global zone.

 The established mechanism for a zone admin to set rctls within the
 zone is via project.* rctls set on projects within the zone.  Granted, in
 the zone.max-swap case, we are not proposing a project.max-swap, due to
 implementation complexity and risk.  With sufficient customer damand, we could
 investigate implementing project.max-swap in the future.

I think I'd agree that allowing a zone to modify its own zone.* rctls
(perhaps only to lower them) is something we *could do* at some point.
But I'm aware of neither an RFE for this nor stated customer demand.
If someone wants this, then let's get that recorded as an RFE in the bug
database, please.

Thanks,

-dp

--
Daniel Price - Solaris Kernel Engineering - [EMAIL PROTECTED] - blogs.sun.com/dp
___
zones-discuss mailing list
zones-discuss@opensolaris.org