Glenn, I've been running this test case now for nearly a day on build 129, 
could'nt
reproduce at all. good chance this being indeed fixed by 6894901 in build 128.

I'll also try to reproduce this now on buil 126.

cheers
frankB

On Fri, 11 Dec 2009 21:48:52 +0100, Glenn Brunette <glenn.brune...@sun.com> 
wrote:

>
> As part of some Immutable Service Container[1] demonstration that I am
> creating for an event in January.  I have the need to start/stop a zone
> quite a few times (as part of a Self-Cleansing[2] demo).  During the
> course of my testing, I have been able to repeatedly get zoneadm to
> hang.
>
> Since I am working with a highly customized configuration, I started
> over with a default zone on OpenSolaris (b127) and was able to repeat
> this issue.  To reproduce this problem use the following script after
> creating a zone usual the normal/default steps:
>
> isc...@osol-isc:~$ while : ; do
>  > echo "`date`: ZONE BOOT"
>  > pfexec zoneadm -z test boot
>  > sleep 30
>  > pfexec zoneamd -z test halt
>  > echo "`date`: ZONE HALT"
>  > sleep 10
>  > done
>
> This script works just fine for a while, but eventually zoneadm hangs
> (was at pass #90 in my last test).  When this happens, zoneadm is shown
> to be consuming quite a bit of CPU:
>
>     PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP
>
>   16598 root       11M 3140K run      1    0   0:54:49  74% zoneadm/1
>
>
> A stack trace of zoneadm shows:
>
> isc...@osol-isc:~$ pfexec pstack `pgrep zoneadm`
> 16082:        zoneadmd -z test
> -----------------  lwp# 1  --------------------------------
> -----------------  lwp# 2  --------------------------------
>   feef41c6 door     (0, 0, 0, 0, 0, 8)
>   feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, feeee39e) + 67
>   feeee3f3 _thrp_setup (fe5b0a00) + 9b
>   feeee680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0)
> -----------------  lwp# 3  --------------------------------
>   feef420f __door_return () + 2f
> -----------------  lwp# 4  --------------------------------
>   feef420f door     (0, 0, 0, fe140e00, f5f00, a)
>   feed9f57 door_create_func (0, fef81000, fe140fe8, feeee39e) + 2f
>   feeee3f3 _thrp_setup (fe5b1a00) + 9b
>   feeee680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0)
> 16598:        zoneadm -z test boot
>   feef3fc8 door     (6, 80476d0, 0, 0, 0, 3)
>   feede653 door_call (6, 80476d0, 400, fe3d43f7) + 7b
>   fe3d44f0 zonecfg_call_zoneadmd (8047e33, 8047730, 8078448, 1) + 124
>   0805792d boot_func (0, 8047d74, 100, 805ff0b) + 1cd
>   08060125 main     (4, 8047d64, 8047d78, 805570f) + 2b9
>   0805576d _start   (4, 8047e28, 8047e30, 8047e33, 8047e38, 0) + 7d
>
>
> A stack trace of zoneadmd shows:
>
> isc...@osol-isc:~$ pfexec pstack `pgrep zoneadmd`
> 16082:        zoneadmd -z test
> -----------------  lwp# 1  --------------------------------
> -----------------  lwp# 2  --------------------------------
>   feef41c6 door     (0, 0, 0, 0, 0, 8)
>   feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, feeee39e) + 67
>   feeee3f3 _thrp_setup (fe5b0a00) + 9b
>   feeee680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0)
> -----------------  lwp# 3  --------------------------------
>   feef4147 __door_ucred (80a37c8, fef81000, fe23e838, feed9cfe) + 27
>   feed9d0d door_ucred (fe23f870, 1000, 0, 0) + 32
>   08058a88 server   (0, fe23f8f0, 510, 0, 0, 8058a04) + 84
>   feef4240 __door_return () + 60
> -----------------  lwp# 4  --------------------------------
>   feef420f door     (0, 0, 0, fe140e00, f5f00, a)
>   feed9f57 door_create_func (0, fef81000, fe140fe8, feeee39e) + 2f
>   feeee3f3 _thrp_setup (fe5b1a00) + 9b
>   feeee680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0)
>
>
> A truss of zoneadm (-f -vall -wall -tall) shows this looping:
>
> 16598:  door_call(6, 0x080476D0)                        = 0
> 16598:          data_ptr=8047730 data_size=0
> 16598:          desc_ptr=0x0 desc_num=0
> 16598:          rbuf=0x807F2D8 rsize=4096
> 16598:  close(6)                                        = 0
> 16598:  mkdir("/var/run/zones", 0700)                   Err#17 EEXIST
> 16598:  chmod("/var/run/zones", 0700)                   = 0
> 16598:  open("/var/run/zones/test.zoneadm.lock", O_RDWR|O_CREAT, 0600) = 6
> 16598:  fcntl(6, F_SETLKW, 0x08046DC0)                  = 0
> 16598:          typ=F_WRLCK  whence=SEEK_SET start=0     len=0
> sys=4277003009 pid=6
> 16598:  open("/var/run/zones/test.zoneadmd_door", O_RDONLY) = 7
> 16598:  door_info(7, 0x08047230)                        = 0
> 16598:          target=16082 proc=0x8058A04 data=0x0
> 16598:          attributes=DOOR_UNREF|DOOR_REFUSE_DESC|DOOR_NO_CANCEL
> 16598:          uniquifier=26426
> 16598:  close(7)                                        = 0
> 16598:  close(6)                                        = 0
> 16598:  open("/var/run/zones/test.zoneadmd_door", O_RDONLY) = 6
> 16082/3:        door_return(0x00000000, 0, 0x00000000, 0xFE23FE00,
> 1007360) = 0
> 16082/3:        door_ucred(0x080A37C8)                          = 0
> 16082/3:                euid=0 egid=0
> 16082/3:                ruid=0 rgid=0
> 16082/3:                pid=16598 zoneid=0
> 16082/3:                E: all
> 16082/3:                I: basic
> 16082/3:                P: all
> 16082/3:                L: all
>
>
> PID 16598 is zoneadm and PID 16082 is zoneadmd.
>
>
> Is this a known issue?  Are there any other things that I can do to
> help debug this situation?  Once things get into this state, I have
> only been able to recover by rebooting the zone.
>
>
>
> Please advise.
>
> g
>
>
> [1] http://kenai.com/projects/isc/pages/OpenSolaris
> [2]
> http://kenai.com/attachments/wiki_images/isc/isc-autonomic-cleansing-time-v1.3.png
> _______________________________________________
> zones-discuss mailing list
> zones-discuss@opensolaris.org
> 



-- 
frankB

It is always possible to agglutinate multiple separate problems
into a single complex interdependent solution.
In most cases this is a bad idea.
_______________________________________________
zones-discuss mailing list
zones-discuss@opensolaris.org

Reply via email to