Hi Douglas Take a look to this bug description Bug ID: 1246893 Title: mmap and write to the same file deadlocks.
below, read a customer discription about consequences of this issue perhaps you customer have a similar problem ?! Regards Christian -------------------------------------------------------------------------------------- The purpose of this email is to expose the Solaris OS issue we would like SUN to recognize and to correct. First of all, we want to distinguish two things - the way the problem appears on our server. - the generic Solaris OS issue associated Actually, we have some clues to work around the symptoms. We'll test them later in the week and we think we'll be able to work with ProFTP without any new downtime However, we also think the generic Solaris OS issue should be solved to avoid future events of this same issue. In fact, all our clues are based on ProFTP side modifications (modifying the source code or disabling some modules). These solutions mean that either a new version of the same program (proftpd) or others "miswritten" programs (volontary or not), could make this problem happen again. Which is not acceptable as a solution for us. What is this issue ? From our point of view, the issue is in fact composed of 2 issues - the first is a process hang (kernel deadlock) when a mmap and a write system calls are invoked on the same file - the second is a hang of all p-commands (including ps) following the launch of a pstack command on the initial hanged process We can bear the first issue but the 2nd one has too much consequences on the whole system to be admitted. We've made a test case to reproduce the problem, here is the scenario : - First just mmap and write on the same file like exposed in the code below #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <sys/mman.h> #include <sys/param.h> #include <stdio.h> #include <stdlib.h> void main(int argc, char **argv) { int fd, r; caddr_t addr; if(argc!=2) { printf("usage: %s filename \n", argv[0]); exit(1); } fd = open(argv[1], O_RDWR|O_CREAT, 0666); printf("open = %d\n", fd); r= ftruncate(fd, PAGESIZE); printf("ftruncate = %d\n", r); addr = mmap(NULL, PAGESIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); printf("mmap = %x\n", raddr); r = write(fd, addr, PAGESIZE); printf("write = %d\n", r); r = munmap(addr, PAGESIZE); printf("munmap = %d\n", r); r = close(fd); printf("close = %d\n", r); } - Compile and run this code on a NFS file system as a normal user The process hang on the write() system call - Then run a ps command to get the process id of this hanged process. - After run a pstack command (as normal user) with for argument the process id obtained previously. Pstack command stay hanged - At the end run ps commands (as root or any other user) All ps commands stay hanged This issue occurs on our site with proftpd version 1.3.0 rc3 This new version implement a new module mod_delay which run some unfortunate mmap and write mix. As we said in our introduction, actually we have identified the part of the proftpd code where mistakes have been made and we know how to solve them. Up to now, Sun Support helped us to analyze and understand why all these hangs occurred. We wish now to reach a stage ahead towards a solution to the generic issue. Of course we are not blocked any more (at least we wish we're not anymore until we'll success our tests), but the issue seems to us rather serious to do not stop here. Let us clarify why the second issue (pstack/ps hang) appears to us as the most serious one. We clearly understand that, in our case, we encounter it as a consequence of the first one (process address space lock is not free). But if we had only encountered the first issue, the situation could have been acceptable because hangs would have been limited only to "dummy programming" processes. The second issue (pstack/ps) extends the problem to many others important global processes and in this case the state is no more acceptable. In a perfect world, Fix all the problems (hangs) would be nice But in a real world and from our point of view one can be satisfied with just a partial fix (just the pstack/ps hang). Furthermore, this issue occurs under Solaris 8, 10 and probably 9 too In the case of Solaris 10 the second issue (pstack/ps) call into question the isolation paradigm of Solaris zone because a simple user process in a non global zone can hang ps commands in the global zone. This calls into question the solaris zone usage for security purpose in our site. More generally we regard this issue as a security hole because a simple user has the capacity to disturb the whole system seriously. We wish that a bug report be raised with all the details given in this email and hope a fix in a future system patch. I hope my mail is not too extended but there is a real suspicion problem on Solaris availability and reliability under all these questions. --------------------------------------------------------------------------- ----- Message d'origine ----- De: Douglas Perry <[EMAIL PROTECTED]> Date: Mardi, Octobre 17, 2006 10:08 pm Objet: [zones-discuss] ps command hangs zone À: [EMAIL PROTECTED] Cc: zones-discuss@opensolaris.org > IHAC that has an application running on a zone..the command ps > (usr/bin/ps) hung..customer tried to shutdown the > zone using 'zoneadm -z zone-name reboot' and 'halt' ..the zone did > not > come down..customer had to reboot the system.. > customer looking for root cause..probably the application.. > > no messages in the messages files > at the latest KJP patch > Document ID: 118844-30 > Title: SunOS 5.10_x86: kernel Patch > > > Any suggestions or clues as to where to find the root cause? > > -- > > Doug Perry > AltPlat Support Engineer > [EMAIL PROTECTED] > Phone: 1-800-USA-4SUN. hit Option1, then punch in case number. > > > ---------------------- > Work Hours: 0700 - 1600 EST > Manager: Dave O'Connor [EMAIL PROTECTED] > > Convienient web access to Sun Support: > http://www.sun.com/service/online > _______________________________________________ > zones-discuss mailing list > zones-discuss@opensolaris.org > _______________________________________________ zones-discuss mailing list zones-discuss@opensolaris.org