Re : [zones-discuss] ps command hangs zone

2006-10-19 Thread Christian Gajan
Hi Douglas

Take a look to this bug description
Bug ID: 1246893
Title:  mmap and write to the same file deadlocks.

below, read a customer discription about consequences of this issue
perhaps you customer have a similar problem ?!

Regards

Christian

--
The purpose of this email is to expose the Solaris OS issue we would like SUN 
to recognize and to correct.

First of all, we want to distinguish two things 
- the way the problem appears on our server.
- the generic Solaris OS issue associated

Actually, we have some clues to work around the symptoms.
We'll test them later in the week and we think we'll
be able to work with ProFTP without any new downtime

However, we also think the generic Solaris OS issue should be solved
to avoid future events of this same issue.
In fact, all our clues are based on ProFTP side modifications (modifying the
source code or disabling some modules).
These solutions mean that either a new version of the same program (proftpd)
or others miswritten programs (volontary or not), could make this 
problem happen again. Which is not acceptable as a solution for us.


What is this issue ?
From our point of view, the issue is in fact composed of 2 issues
- the first is a process hang (kernel deadlock) when a mmap
  and a write system calls are invoked on the same file
- the second is a hang of all p-commands (including ps)
  following the launch of a pstack command on the initial
  hanged process

We can bear the first issue but the 2nd one has too much consequences 
on the whole system to be admitted.

We've made a test case to reproduce the problem, here is the scenario :

- First just mmap and write on the same file
  like exposed in the code below 

#include unistd.h
#include sys/types.h
#include sys/stat.h
#include fcntl.h
#include sys/mman.h
#include sys/param.h
#include stdio.h
#include stdlib.h

void main(int argc, char **argv)
{
   int fd, r;
   caddr_t addr;

   if(argc!=2) { printf(usage: %s filename \n, argv[0]); exit(1); }
   fd = open(argv[1], O_RDWR|O_CREAT, 0666);
   printf(open = %d\n, fd);
   r= ftruncate(fd, PAGESIZE);
   printf(ftruncate = %d\n, r);
   addr = mmap(NULL, PAGESIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
   printf(mmap = %x\n, raddr);
   r = write(fd, addr, PAGESIZE);
   printf(write = %d\n, r);
   r = munmap(addr, PAGESIZE);
   printf(munmap = %d\n, r);
   r = close(fd);
   printf(close = %d\n, r);
}

- Compile and run this code on a NFS file system as a normal user
  The process hang on the write() system call
- Then run a ps command to get the process id of this hanged process.
- After run a pstack command (as normal user) with for argument the 
  process id obtained previously. Pstack command stay hanged
- At the end run ps commands (as root or any other user)
  All ps commands stay hanged

This issue occurs on our site with proftpd version 1.3.0 rc3
This new version implement a new module mod_delay which
run some unfortunate mmap and write mix.
As we said in our introduction, actually we have identified
the part of the proftpd code where mistakes have been made and
we know how to solve them.

Up to now, Sun Support helped us to analyze and understand why all
these hangs occurred. We wish now to reach a stage ahead towards 
a solution to the generic issue.

Of course we are not blocked any more (at least we wish we're not anymore
until we'll success our tests), but the issue
seems to us rather serious to do not stop here.

Let us clarify why the second issue (pstack/ps hang) appears to us 
as the most serious one. We clearly understand that, in our case,
we encounter it as a consequence of the first one (process address 
space lock is not free).
But if we had only encountered the first issue, the situation could 
have been acceptable because hangs would have been limited only to dummy 
programming processes. The second issue (pstack/ps) extends
the problem to many others important global processes and in
this case the state is no more acceptable.

In a perfect world, Fix all the problems (hangs) would be nice
But in a real world and from our point of view one can be satisfied 
with just a partial fix (just the pstack/ps hang).

Furthermore, this issue occurs under Solaris 8, 10 and probably 9 too
In the case of Solaris 10 the second issue (pstack/ps)
call into question the isolation paradigm of Solaris zone 
because a simple user process in a non global zone can hang 
ps commands in the global zone. 
This calls into question the solaris zone usage for security 
purpose in our site. More generally we regard this issue as 
a security hole because a simple user has the capacity to 
disturb the whole system seriously.

We wish that a bug report be raised with all the details
given in this email and hope a fix in a future system patch.

I hope my mail is not too extended but there is a real suspicion 
problem on Solaris availability and reliability under all these

[zones-discuss] zone resource control, who gets signaled?

2006-10-19 Thread Christine Tran
The zones.cpu-shares rctl has a set of threshhold actions: none, deny 
and signal=.  Say if I set the action as signal=TERM, who actually gets 
signaled?  Is it the process in the zone that's currently queuing to get 
on CPU, or is it zoneadmd (which presumably will pass it back?)


I've always used (priv=priviledge,limit=n,action=none), that enforces 
the limit for me.  What's the difference in behavior between none and 
deny?


Thanks!

CT
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] zone resource control, who gets signaled?

2006-10-19 Thread Jerry Jelinek

Christine Tran wrote:
The zones.cpu-shares rctl has a set of threshhold actions: none, deny 
and signal=.  Say if I set the action as signal=TERM, who actually gets 
signaled?  Is it the process in the zone that's currently queuing to get 
on CPU, or is it zoneadmd (which presumably will pass it back?)


I've always used (priv=priviledge,limit=n,action=none), that enforces 
the limit for me.  What's the difference in behavior between none and 
deny?


zonecfg won't allow you to set rctl priv to anything other than 'privileged'
and rctl action to anything other than 'none' or 'deny'.  This is one
of the things we are making simpler with the new zones/rm project and
its rctl aliases.  'action=none' is the only thing that makes sense for
cpu-shares since cpu-shares don't really have an action.  This rctl
just tells the FSS what portion to assign to this zone.

Jerry
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] zone resource control, who gets signaled?

2006-10-19 Thread Jeff Victor
IIRC for zones.cpu-shares the action is ignored.  Something about all infinite 
resources behave like this, i.e. CPU cycles aren't bounded.  To the scheduler, 
you can always get more cycles if you're willint to wait a nanosecond or six.


Which makes sense to me - under what conditions would a process be signaled?

Christine Tran wrote:
The zones.cpu-shares rctl has a set of threshhold actions: none, deny 
and signal=.  Say if I set the action as signal=TERM, who actually gets 
signaled?  Is it the process in the zone that's currently queuing to get 
on CPU, or is it zoneadmd (which presumably will pass it back?)


I've always used (priv=priviledge,limit=n,action=none), that enforces 
the limit for me.  What's the difference in behavior between none and 
deny?


Thanks!



--
Jeff VICTOR  Sun Microsystemsjeff.victor @ sun.com
OS AmbassadorSr. Technical Specialist
Solaris 10 Zones FAQ:http://www.opensolaris.org/os/community/zones/faq
--
___
zones-discuss mailing list
zones-discuss@opensolaris.org


[zones-discuss] Re: Can SAMBA be run in a non-global zone?

2006-10-19 Thread Phil Freund
The blastwave.org Samba distribution doesn't have this issue: its shutdown 
(/etc/init.d/cswsamba stop) uses the pid IDs for smbd, nmbd, and winbindd 
stored in /opt/csw/var/locks/.

A quick FYI on using the blastwave distribution: If you are using sparse zones 
and need to run Samba with winbind, you have to install the Samba packages into 
the global so that the winbind package (CSWsambawb) can add the files to 
/usr/lib. That said, if you don't create a smb.conf file in the global, Samba 
won't start there, so it's not a big issue.

Phil
 
 
This message posted from opensolaris.org
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] CPU load values in a zone

2006-10-19 Thread John Beck
Brian It appears the load values obtained within a local zone are measured
Brian across the whole system rather than for just the processes within
Brian that local zone.

For all CPUs in whatever processor set sendmail is running in, which by
default would be the whole system.


Brian IHAC ...

Note that in my experience, that acronym is not widely used outside Sun, so
on a list such as this, it would be polite to spell out I have a customer.
:-)


Brian ... that uses sendmail in multiple zones on the same system and
Brian it uses the load metric for decisions about when to queue, when to
Brian refuse connections, etc.  Does the sendmail in a local zone get its
Brian LA metrics based on only its local zone or across the entire system?

sendmail on Solaris (9 and later) uses pset_getloadavg(3C).


Brian An how does that play with the use of FSS?  If its for the entire
Brian system, this would skew the behavior of how it would work in zones.

I'll let someone more expert on the fair-share scheduler comment on that.


Brian Also, would pools with processor sets make this better or worse?

I would suspect better, since only CPUs in that processor set would be
counted.

-- John

http://blogs.sun.com/jbeck
___
zones-discuss mailing list
zones-discuss@opensolaris.org


[zones-discuss] Re: Solaris 10 Screencasts

2006-10-19 Thread msl
I have published three DTrace screencasts, but soon i will try publish more.

 Leal.
 
 
This message posted from opensolaris.org
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] CPU load values in a zone

2006-10-19 Thread Brian Kolaci

Jeff Victor wrote:

Brian Kolaci wrote:



I've been discussing about how to chop up a machine.  An possible example
configuration would have 8 cpus, 3 local zones.  They would possibly be
divided up as 50%, 25% and 25%.  Its clear how to do this with pools,
however FSS is a great fit for when a zone may need more CPU than whats
available in the pool/psrset.  The problem with FSS in this case is that
if one zone is mostly idle and all the other zones are busy, the zone
that is idle will get a load average much higher than its really using
which can skew the calculations use by the sendmail process to determine
if the queue/refuse connection thresholds are met.



How does FSS make that situation worse?  The misleading [1] load avg is 
not affected by FSS, which is merely enforcing the minimum CPU-power 
portions that you chose.  If they are inappropriate, prctl can be your 
friend. :-)



[1] misleading for this situation, not so for others.


I guess what I mean is that with FSS, people get the impression
that they are dividing the resources fairly among the zones but the
misleading load average tells processes that they're already using
all or more than their share already.  Agreed, its not really a problem
of FSS, but that the load averages reported in a zone do not reflect
what it actually is in the zone, but of the processor set it is associated
with.
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] CPU load values in a zone

2006-10-19 Thread Jim Mauro


Remember that FSS is designed to provide a minimum, but not a max.
Depending on CPU use by other threads in the class, a given thread may
get more than it's alloted CPU shares, but it will never get less.

/jim


Brian Kolaci wrote:

Jeff Victor wrote:

Brian Kolaci wrote:



I've been discussing about how to chop up a machine.  An possible 
example

configuration would have 8 cpus, 3 local zones.  They would possibly be
divided up as 50%, 25% and 25%.  Its clear how to do this with pools,
however FSS is a great fit for when a zone may need more CPU than whats
available in the pool/psrset.  The problem with FSS in this case is 
that

if one zone is mostly idle and all the other zones are busy, the zone
that is idle will get a load average much higher than its really using
which can skew the calculations use by the sendmail process to 
determine

if the queue/refuse connection thresholds are met.



How does FSS make that situation worse?  The misleading [1] load avg 
is not affected by FSS, which is merely enforcing the minimum 
CPU-power portions that you chose.  If they are inappropriate, prctl 
can be your friend. :-)



[1] misleading for this situation, not so for others.


I guess what I mean is that with FSS, people get the impression
that they are dividing the resources fairly among the zones but the
misleading load average tells processes that they're already using
all or more than their share already.  Agreed, its not really a problem
of FSS, but that the load averages reported in a zone do not reflect
what it actually is in the zone, but of the processor set it is 
associated

with.
___
zones-discuss mailing list
zones-discuss@opensolaris.org

___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] CPU load values in a zone

2006-10-19 Thread Brian Kolaci

Thanks, but I think we're getting off topic.
I know how FSS works and what its intended for, however the
issue isn't with FSS but more that the load averages as seen
within a zone are not based on the loads in the zone, but
rather to the pool to which the zone is associated with.

FSS isn't the culprit here, sorry I made it sound that way.
So if you don't enable pools (so there's one shared pool) and
you use FSS to divide up the resources, then sendmail within one
of the local zones makes decisions based on the load average of
the processor set (which is the load of all zones put together)
rather than just the workload of the zone.  So if another zone
consumes 99.9% of the CPU, the idle zone running sendmail will
reject connections because the load average has been exceeded,
even though FSS will guarantee it more CPU.


Jim Mauro wrote:


Remember that FSS is designed to provide a minimum, but not a max.
Depending on CPU use by other threads in the class, a given thread may
get more than it's alloted CPU shares, but it will never get less.

/jim


Brian Kolaci wrote:


Jeff Victor wrote:


Brian Kolaci wrote:



I've been discussing about how to chop up a machine.  An possible 
example

configuration would have 8 cpus, 3 local zones.  They would possibly be
divided up as 50%, 25% and 25%.  Its clear how to do this with pools,
however FSS is a great fit for when a zone may need more CPU than whats
available in the pool/psrset.  The problem with FSS in this case is 
that

if one zone is mostly idle and all the other zones are busy, the zone
that is idle will get a load average much higher than its really using
which can skew the calculations use by the sendmail process to 
determine

if the queue/refuse connection thresholds are met.




How does FSS make that situation worse?  The misleading [1] load avg 
is not affected by FSS, which is merely enforcing the minimum 
CPU-power portions that you chose.  If they are inappropriate, prctl 
can be your friend. :-)



[1] misleading for this situation, not so for others.



I guess what I mean is that with FSS, people get the impression
that they are dividing the resources fairly among the zones but the
misleading load average tells processes that they're already using
all or more than their share already.  Agreed, its not really a problem
of FSS, but that the load averages reported in a zone do not reflect
what it actually is in the zone, but of the processor set it is 
associated

with.
___
zones-discuss mailing list
zones-discuss@opensolaris.org


___
zones-discuss mailing list
zones-discuss@opensolaris.org