Re : [zones-discuss] ps command hangs zone
Hi Douglas Take a look to this bug description Bug ID: 1246893 Title: mmap and write to the same file deadlocks. below, read a customer discription about consequences of this issue perhaps you customer have a similar problem ?! Regards Christian -- The purpose of this email is to expose the Solaris OS issue we would like SUN to recognize and to correct. First of all, we want to distinguish two things - the way the problem appears on our server. - the generic Solaris OS issue associated Actually, we have some clues to work around the symptoms. We'll test them later in the week and we think we'll be able to work with ProFTP without any new downtime However, we also think the generic Solaris OS issue should be solved to avoid future events of this same issue. In fact, all our clues are based on ProFTP side modifications (modifying the source code or disabling some modules). These solutions mean that either a new version of the same program (proftpd) or others miswritten programs (volontary or not), could make this problem happen again. Which is not acceptable as a solution for us. What is this issue ? From our point of view, the issue is in fact composed of 2 issues - the first is a process hang (kernel deadlock) when a mmap and a write system calls are invoked on the same file - the second is a hang of all p-commands (including ps) following the launch of a pstack command on the initial hanged process We can bear the first issue but the 2nd one has too much consequences on the whole system to be admitted. We've made a test case to reproduce the problem, here is the scenario : - First just mmap and write on the same file like exposed in the code below #include unistd.h #include sys/types.h #include sys/stat.h #include fcntl.h #include sys/mman.h #include sys/param.h #include stdio.h #include stdlib.h void main(int argc, char **argv) { int fd, r; caddr_t addr; if(argc!=2) { printf(usage: %s filename \n, argv[0]); exit(1); } fd = open(argv[1], O_RDWR|O_CREAT, 0666); printf(open = %d\n, fd); r= ftruncate(fd, PAGESIZE); printf(ftruncate = %d\n, r); addr = mmap(NULL, PAGESIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); printf(mmap = %x\n, raddr); r = write(fd, addr, PAGESIZE); printf(write = %d\n, r); r = munmap(addr, PAGESIZE); printf(munmap = %d\n, r); r = close(fd); printf(close = %d\n, r); } - Compile and run this code on a NFS file system as a normal user The process hang on the write() system call - Then run a ps command to get the process id of this hanged process. - After run a pstack command (as normal user) with for argument the process id obtained previously. Pstack command stay hanged - At the end run ps commands (as root or any other user) All ps commands stay hanged This issue occurs on our site with proftpd version 1.3.0 rc3 This new version implement a new module mod_delay which run some unfortunate mmap and write mix. As we said in our introduction, actually we have identified the part of the proftpd code where mistakes have been made and we know how to solve them. Up to now, Sun Support helped us to analyze and understand why all these hangs occurred. We wish now to reach a stage ahead towards a solution to the generic issue. Of course we are not blocked any more (at least we wish we're not anymore until we'll success our tests), but the issue seems to us rather serious to do not stop here. Let us clarify why the second issue (pstack/ps hang) appears to us as the most serious one. We clearly understand that, in our case, we encounter it as a consequence of the first one (process address space lock is not free). But if we had only encountered the first issue, the situation could have been acceptable because hangs would have been limited only to dummy programming processes. The second issue (pstack/ps) extends the problem to many others important global processes and in this case the state is no more acceptable. In a perfect world, Fix all the problems (hangs) would be nice But in a real world and from our point of view one can be satisfied with just a partial fix (just the pstack/ps hang). Furthermore, this issue occurs under Solaris 8, 10 and probably 9 too In the case of Solaris 10 the second issue (pstack/ps) call into question the isolation paradigm of Solaris zone because a simple user process in a non global zone can hang ps commands in the global zone. This calls into question the solaris zone usage for security purpose in our site. More generally we regard this issue as a security hole because a simple user has the capacity to disturb the whole system seriously. We wish that a bug report be raised with all the details given in this email and hope a fix in a future system patch. I hope my mail is not too extended but there is a real suspicion problem on Solaris availability and reliability under all these
[zones-discuss] zone resource control, who gets signaled?
The zones.cpu-shares rctl has a set of threshhold actions: none, deny and signal=. Say if I set the action as signal=TERM, who actually gets signaled? Is it the process in the zone that's currently queuing to get on CPU, or is it zoneadmd (which presumably will pass it back?) I've always used (priv=priviledge,limit=n,action=none), that enforces the limit for me. What's the difference in behavior between none and deny? Thanks! CT ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] zone resource control, who gets signaled?
Christine Tran wrote: The zones.cpu-shares rctl has a set of threshhold actions: none, deny and signal=. Say if I set the action as signal=TERM, who actually gets signaled? Is it the process in the zone that's currently queuing to get on CPU, or is it zoneadmd (which presumably will pass it back?) I've always used (priv=priviledge,limit=n,action=none), that enforces the limit for me. What's the difference in behavior between none and deny? zonecfg won't allow you to set rctl priv to anything other than 'privileged' and rctl action to anything other than 'none' or 'deny'. This is one of the things we are making simpler with the new zones/rm project and its rctl aliases. 'action=none' is the only thing that makes sense for cpu-shares since cpu-shares don't really have an action. This rctl just tells the FSS what portion to assign to this zone. Jerry ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] zone resource control, who gets signaled?
IIRC for zones.cpu-shares the action is ignored. Something about all infinite resources behave like this, i.e. CPU cycles aren't bounded. To the scheduler, you can always get more cycles if you're willint to wait a nanosecond or six. Which makes sense to me - under what conditions would a process be signaled? Christine Tran wrote: The zones.cpu-shares rctl has a set of threshhold actions: none, deny and signal=. Say if I set the action as signal=TERM, who actually gets signaled? Is it the process in the zone that's currently queuing to get on CPU, or is it zoneadmd (which presumably will pass it back?) I've always used (priv=priviledge,limit=n,action=none), that enforces the limit for me. What's the difference in behavior between none and deny? Thanks! -- Jeff VICTOR Sun Microsystemsjeff.victor @ sun.com OS AmbassadorSr. Technical Specialist Solaris 10 Zones FAQ:http://www.opensolaris.org/os/community/zones/faq -- ___ zones-discuss mailing list zones-discuss@opensolaris.org
[zones-discuss] Re: Can SAMBA be run in a non-global zone?
The blastwave.org Samba distribution doesn't have this issue: its shutdown (/etc/init.d/cswsamba stop) uses the pid IDs for smbd, nmbd, and winbindd stored in /opt/csw/var/locks/. A quick FYI on using the blastwave distribution: If you are using sparse zones and need to run Samba with winbind, you have to install the Samba packages into the global so that the winbind package (CSWsambawb) can add the files to /usr/lib. That said, if you don't create a smb.conf file in the global, Samba won't start there, so it's not a big issue. Phil This message posted from opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] CPU load values in a zone
Brian It appears the load values obtained within a local zone are measured Brian across the whole system rather than for just the processes within Brian that local zone. For all CPUs in whatever processor set sendmail is running in, which by default would be the whole system. Brian IHAC ... Note that in my experience, that acronym is not widely used outside Sun, so on a list such as this, it would be polite to spell out I have a customer. :-) Brian ... that uses sendmail in multiple zones on the same system and Brian it uses the load metric for decisions about when to queue, when to Brian refuse connections, etc. Does the sendmail in a local zone get its Brian LA metrics based on only its local zone or across the entire system? sendmail on Solaris (9 and later) uses pset_getloadavg(3C). Brian An how does that play with the use of FSS? If its for the entire Brian system, this would skew the behavior of how it would work in zones. I'll let someone more expert on the fair-share scheduler comment on that. Brian Also, would pools with processor sets make this better or worse? I would suspect better, since only CPUs in that processor set would be counted. -- John http://blogs.sun.com/jbeck ___ zones-discuss mailing list zones-discuss@opensolaris.org
[zones-discuss] Re: Solaris 10 Screencasts
I have published three DTrace screencasts, but soon i will try publish more. Leal. This message posted from opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] CPU load values in a zone
Jeff Victor wrote: Brian Kolaci wrote: I've been discussing about how to chop up a machine. An possible example configuration would have 8 cpus, 3 local zones. They would possibly be divided up as 50%, 25% and 25%. Its clear how to do this with pools, however FSS is a great fit for when a zone may need more CPU than whats available in the pool/psrset. The problem with FSS in this case is that if one zone is mostly idle and all the other zones are busy, the zone that is idle will get a load average much higher than its really using which can skew the calculations use by the sendmail process to determine if the queue/refuse connection thresholds are met. How does FSS make that situation worse? The misleading [1] load avg is not affected by FSS, which is merely enforcing the minimum CPU-power portions that you chose. If they are inappropriate, prctl can be your friend. :-) [1] misleading for this situation, not so for others. I guess what I mean is that with FSS, people get the impression that they are dividing the resources fairly among the zones but the misleading load average tells processes that they're already using all or more than their share already. Agreed, its not really a problem of FSS, but that the load averages reported in a zone do not reflect what it actually is in the zone, but of the processor set it is associated with. ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] CPU load values in a zone
Remember that FSS is designed to provide a minimum, but not a max. Depending on CPU use by other threads in the class, a given thread may get more than it's alloted CPU shares, but it will never get less. /jim Brian Kolaci wrote: Jeff Victor wrote: Brian Kolaci wrote: I've been discussing about how to chop up a machine. An possible example configuration would have 8 cpus, 3 local zones. They would possibly be divided up as 50%, 25% and 25%. Its clear how to do this with pools, however FSS is a great fit for when a zone may need more CPU than whats available in the pool/psrset. The problem with FSS in this case is that if one zone is mostly idle and all the other zones are busy, the zone that is idle will get a load average much higher than its really using which can skew the calculations use by the sendmail process to determine if the queue/refuse connection thresholds are met. How does FSS make that situation worse? The misleading [1] load avg is not affected by FSS, which is merely enforcing the minimum CPU-power portions that you chose. If they are inappropriate, prctl can be your friend. :-) [1] misleading for this situation, not so for others. I guess what I mean is that with FSS, people get the impression that they are dividing the resources fairly among the zones but the misleading load average tells processes that they're already using all or more than their share already. Agreed, its not really a problem of FSS, but that the load averages reported in a zone do not reflect what it actually is in the zone, but of the processor set it is associated with. ___ zones-discuss mailing list zones-discuss@opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] CPU load values in a zone
Thanks, but I think we're getting off topic. I know how FSS works and what its intended for, however the issue isn't with FSS but more that the load averages as seen within a zone are not based on the loads in the zone, but rather to the pool to which the zone is associated with. FSS isn't the culprit here, sorry I made it sound that way. So if you don't enable pools (so there's one shared pool) and you use FSS to divide up the resources, then sendmail within one of the local zones makes decisions based on the load average of the processor set (which is the load of all zones put together) rather than just the workload of the zone. So if another zone consumes 99.9% of the CPU, the idle zone running sendmail will reject connections because the load average has been exceeded, even though FSS will guarantee it more CPU. Jim Mauro wrote: Remember that FSS is designed to provide a minimum, but not a max. Depending on CPU use by other threads in the class, a given thread may get more than it's alloted CPU shares, but it will never get less. /jim Brian Kolaci wrote: Jeff Victor wrote: Brian Kolaci wrote: I've been discussing about how to chop up a machine. An possible example configuration would have 8 cpus, 3 local zones. They would possibly be divided up as 50%, 25% and 25%. Its clear how to do this with pools, however FSS is a great fit for when a zone may need more CPU than whats available in the pool/psrset. The problem with FSS in this case is that if one zone is mostly idle and all the other zones are busy, the zone that is idle will get a load average much higher than its really using which can skew the calculations use by the sendmail process to determine if the queue/refuse connection thresholds are met. How does FSS make that situation worse? The misleading [1] load avg is not affected by FSS, which is merely enforcing the minimum CPU-power portions that you chose. If they are inappropriate, prctl can be your friend. :-) [1] misleading for this situation, not so for others. I guess what I mean is that with FSS, people get the impression that they are dividing the resources fairly among the zones but the misleading load average tells processes that they're already using all or more than their share already. Agreed, its not really a problem of FSS, but that the load averages reported in a zone do not reflect what it actually is in the zone, but of the processor set it is associated with. ___ zones-discuss mailing list zones-discuss@opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org