Re : [zones-discuss] ps command hangs zone

Christian Gajan Thu, 19 Oct 2006 01:20:24 -0700

Hi Douglas

Take a look to this bug description
Bug ID: 1246893
Title:  mmap and write to the same file deadlocks.


below, read a customer discription about consequences of this issue
perhaps you customer have a similar problem ?!

Regards

Christian

--------------------------------------------------------------------------------------
The purpose of this email is to expose the Solaris OS issue we would like SUN 
to recognize and to correct.

First of all, we want to distinguish two things 
- the way the problem appears on our server.
- the generic Solaris OS issue associated

Actually, we have some clues to work around the symptoms.
We'll test them later in the week and we think we'll
be able to work with ProFTP without any new downtime

However, we also think the generic Solaris OS issue should be solved
to avoid future events of this same issue.
In fact, all our clues are based on ProFTP side modifications (modifying the
source code or disabling some modules).
These solutions mean that either a new version of the same program (proftpd)
or others "miswritten" programs (volontary or not), could make this 
problem happen again. Which is not acceptable as a solution for us.


What is this issue ?
From our point of view, the issue is in fact composed of 2 issues
- the first is a process hang (kernel deadlock) when a mmap
  and a write system calls are invoked on the same file
- the second is a hang of all p-commands (including ps)
  following the launch of a pstack command on the initial
  hanged process

We can bear the first issue but the 2nd one has too much consequences 
on the whole system to be admitted.

We've made a test case to reproduce the problem, here is the scenario :

- First just mmap and write on the same file
  like exposed in the code below 

#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/param.h>
#include <stdio.h>
#include <stdlib.h>

void main(int argc, char **argv)
{
   int fd, r;
   caddr_t addr;

   if(argc!=2) { printf("usage: %s filename \n", argv[0]); exit(1); }
   fd = open(argv[1], O_RDWR|O_CREAT, 0666);
   printf("open = %d\n", fd);
   r= ftruncate(fd, PAGESIZE);
   printf("ftruncate = %d\n", r);
   addr = mmap(NULL, PAGESIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
   printf("mmap = %x\n", raddr);
   r = write(fd, addr, PAGESIZE);
   printf("write = %d\n", r);
   r = munmap(addr, PAGESIZE);
   printf("munmap = %d\n", r);
   r = close(fd);
   printf("close = %d\n", r);
}

- Compile and run this code on a NFS file system as a normal user
  The process hang on the write() system call
- Then run a ps command to get the process id of this hanged process.
- After run a pstack command (as normal user) with for argument the 
  process id obtained previously. Pstack command stay hanged
- At the end run ps commands (as root or any other user)
  All ps commands stay hanged

This issue occurs on our site with proftpd version 1.3.0 rc3
This new version implement a new module mod_delay which
run some unfortunate mmap and write mix.
As we said in our introduction, actually we have identified
the part of the proftpd code where mistakes have been made and
we know how to solve them.

Up to now, Sun Support helped us to analyze and understand why all
these hangs occurred. We wish now to reach a stage ahead towards 
a solution to the generic issue.

Of course we are not blocked any more (at least we wish we're not anymore
until we'll success our tests), but the issue
seems to us rather serious to do not stop here.

Let us clarify why the second issue (pstack/ps hang) appears to us 
as the most serious one. We clearly understand that, in our case,
we encounter it as a consequence of the first one (process address 
space lock is not free).
But if we had only encountered the first issue, the situation could 
have been acceptable because hangs would have been limited only to "dummy 
programming" processes. The second issue (pstack/ps) extends
the problem to many others important global processes and in
this case the state is no more acceptable.

In a perfect world, Fix all the problems (hangs) would be nice
But in a real world and from our point of view one can be satisfied 
with just a partial fix (just the pstack/ps hang).

Furthermore, this issue occurs under Solaris 8, 10 and probably 9 too
In the case of Solaris 10 the second issue (pstack/ps)
call into question the isolation paradigm of Solaris zone 
because a simple user process in a non global zone can hang 
ps commands in the global zone. 
This calls into question the solaris zone usage for security 
purpose in our site. More generally we regard this issue as 
a security hole because a simple user has the capacity to 
disturb the whole system seriously.

We wish that a bug report be raised with all the details
given in this email and hope a fix in a future system patch.

I hope my mail is not too extended but there is a real suspicion 
problem on Solaris availability and reliability under all these
questions.
---------------------------------------------------------------------------


----- Message d'origine -----
De: Douglas Perry <[EMAIL PROTECTED]>
Date: Mardi, Octobre 17, 2006 10:08 pm
Objet: [zones-discuss] ps command hangs zone
À: [EMAIL PROTECTED]
Cc: zones-discuss@opensolaris.org

> IHAC that has an application running on a zone..the command ps 
> (usr/bin/ps) hung..customer tried to shutdown the
> zone using 'zoneadm -z zone-name reboot' and 'halt' ..the zone did 
> not 
> come down..customer had to reboot the system..
> customer looking for root cause..probably the application..
> 
> no messages in the messages files
> at the latest KJP patch
> Document ID:  118844-30
> Title:        SunOS 5.10_x86: kernel Patch
> 
> 
> Any suggestions or clues as to where to find the root cause?
> 
> -- 
> 
> Doug Perry
> AltPlat Support Engineer
> [EMAIL PROTECTED]
> Phone: 1-800-USA-4SUN. hit Option1, then punch in case number.
> 
> 
> ----------------------
> Work Hours: 0700 - 1600 EST
> Manager: Dave O'Connor [EMAIL PROTECTED]
> 
> Convienient web access to Sun Support:
> http://www.sun.com/service/online
> _______________________________________________
> zones-discuss mailing list
> zones-discuss@opensolaris.org
>
_______________________________________________
zones-discuss mailing list
zones-discuss@opensolaris.org

Re : [zones-discuss] ps command hangs zone

Reply via email to