On Sat, 12 Dec 2009 10:06:43 +0100, Frank Batschulat (Home) 
<frank.batschu...@sun.com> wrote:

> sounds somewhat similar to
>
> 6773836 zoneadm halt or halting/rebooting a non-global zone hangs the global 
> zone

wrong cut+past I did ment to say:

6734679 zoneadm halt hung during zones test


> I'll try to reproduce this using your test case and see what I find. please 
> file a bug
> if it's still happen with 128 and is not fixed by  6894901 as Steve suggested.
>
> cheers
> frankB
>
> On Fri, 11 Dec 2009 21:48:52 +0100, Glenn Brunette <glenn.brune...@sun.com> 
> wrote:
>
>>
>> As part of some Immutable Service Container[1] demonstration that I am
>> creating for an event in January.  I have the need to start/stop a zone
>> quite a few times (as part of a Self-Cleansing[2] demo).  During the
>> course of my testing, I have been able to repeatedly get zoneadm to
>> hang.
>>
>> Since I am working with a highly customized configuration, I started
>> over with a default zone on OpenSolaris (b127) and was able to repeat
>> this issue.  To reproduce this problem use the following script after
>> creating a zone usual the normal/default steps:
>>
>> isc...@osol-isc:~$ while : ; do
>>  > echo "`date`: ZONE BOOT"
>>  > pfexec zoneadm -z test boot
>>  > sleep 30
>>  > pfexec zoneamd -z test halt
>>  > echo "`date`: ZONE HALT"
>>  > sleep 10
>>  > done
>>
>> This script works just fine for a while, but eventually zoneadm hangs
>> (was at pass #90 in my last test).  When this happens, zoneadm is shown
>> to be consuming quite a bit of CPU:
>>
>>     PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP
>>
>>   16598 root       11M 3140K run      1    0   0:54:49  74% zoneadm/1
>>
>>
>> A stack trace of zoneadm shows:
>>
>> isc...@osol-isc:~$ pfexec pstack `pgrep zoneadm`
>> 16082:       zoneadmd -z test
>> -----------------  lwp# 1  --------------------------------
>> -----------------  lwp# 2  --------------------------------
>>   feef41c6 door     (0, 0, 0, 0, 0, 8)
>>   feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, feeee39e) + 67
>>   feeee3f3 _thrp_setup (fe5b0a00) + 9b
>>   feeee680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0)
>> -----------------  lwp# 3  --------------------------------
>>   feef420f __door_return () + 2f
>> -----------------  lwp# 4  --------------------------------
>>   feef420f door     (0, 0, 0, fe140e00, f5f00, a)
>>   feed9f57 door_create_func (0, fef81000, fe140fe8, feeee39e) + 2f
>>   feeee3f3 _thrp_setup (fe5b1a00) + 9b
>>   feeee680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0)
>> 16598:       zoneadm -z test boot
>>   feef3fc8 door     (6, 80476d0, 0, 0, 0, 3)
>>   feede653 door_call (6, 80476d0, 400, fe3d43f7) + 7b
>>   fe3d44f0 zonecfg_call_zoneadmd (8047e33, 8047730, 8078448, 1) + 124
>>   0805792d boot_func (0, 8047d74, 100, 805ff0b) + 1cd
>>   08060125 main     (4, 8047d64, 8047d78, 805570f) + 2b9
>>   0805576d _start   (4, 8047e28, 8047e30, 8047e33, 8047e38, 0) + 7d
>>
>>
>> A stack trace of zoneadmd shows:
>>
>> isc...@osol-isc:~$ pfexec pstack `pgrep zoneadmd`
>> 16082:       zoneadmd -z test
>> -----------------  lwp# 1  --------------------------------
>> -----------------  lwp# 2  --------------------------------
>>   feef41c6 door     (0, 0, 0, 0, 0, 8)
>>   feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, feeee39e) + 67
>>   feeee3f3 _thrp_setup (fe5b0a00) + 9b
>>   feeee680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0)
>> -----------------  lwp# 3  --------------------------------
>>   feef4147 __door_ucred (80a37c8, fef81000, fe23e838, feed9cfe) + 27
>>   feed9d0d door_ucred (fe23f870, 1000, 0, 0) + 32
>>   08058a88 server   (0, fe23f8f0, 510, 0, 0, 8058a04) + 84
>>   feef4240 __door_return () + 60
>> -----------------  lwp# 4  --------------------------------
>>   feef420f door     (0, 0, 0, fe140e00, f5f00, a)
>>   feed9f57 door_create_func (0, fef81000, fe140fe8, feeee39e) + 2f
>>   feeee3f3 _thrp_setup (fe5b1a00) + 9b
>>   feeee680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0)
>>
>>
>> A truss of zoneadm (-f -vall -wall -tall) shows this looping:
>>
>> 16598:  door_call(6, 0x080476D0)                        = 0
>> 16598:          data_ptr=8047730 data_size=0
>> 16598:          desc_ptr=0x0 desc_num=0
>> 16598:          rbuf=0x807F2D8 rsize=4096
>> 16598:  close(6)                                        = 0
>> 16598:  mkdir("/var/run/zones", 0700)                   Err#17 EEXIST
>> 16598:  chmod("/var/run/zones", 0700)                   = 0
>> 16598:  open("/var/run/zones/test.zoneadm.lock", O_RDWR|O_CREAT, 0600) = 6
>> 16598:  fcntl(6, F_SETLKW, 0x08046DC0)                  = 0
>> 16598:          typ=F_WRLCK  whence=SEEK_SET start=0     len=0
>> sys=4277003009 pid=6
>> 16598:  open("/var/run/zones/test.zoneadmd_door", O_RDONLY) = 7
>> 16598:  door_info(7, 0x08047230)                        = 0
>> 16598:          target=16082 proc=0x8058A04 data=0x0
>> 16598:          attributes=DOOR_UNREF|DOOR_REFUSE_DESC|DOOR_NO_CANCEL
>> 16598:          uniquifier=26426
>> 16598:  close(7)                                        = 0
>> 16598:  close(6)                                        = 0
>> 16598:  open("/var/run/zones/test.zoneadmd_door", O_RDONLY) = 6
>> 16082/3:        door_return(0x00000000, 0, 0x00000000, 0xFE23FE00,
>> 1007360) = 0
>> 16082/3:        door_ucred(0x080A37C8)                          = 0
>> 16082/3:                euid=0 egid=0
>> 16082/3:                ruid=0 rgid=0
>> 16082/3:                pid=16598 zoneid=0
>> 16082/3:                E: all
>> 16082/3:                I: basic
>> 16082/3:                P: all
>> 16082/3:                L: all
>>
>>
>> PID 16598 is zoneadm and PID 16082 is zoneadmd.
>>
>>
>> Is this a known issue?  Are there any other things that I can do to
>> help debug this situation?  Once things get into this state, I have
>> only been able to recover by rebooting the zone.
>>
>>
>>
>> Please advise.
>>
>> g
>>
>>
>> [1] http://kenai.com/projects/isc/pages/OpenSolaris
>> [2]
>> http://kenai.com/attachments/wiki_images/isc/isc-autonomic-cleansing-time-v1.3.png
>> _______________________________________________
>> zones-discuss mailing list
>> zones-discuss@opensolaris.org
>>
>
>
>



-- 
frankB

It is always possible to agglutinate multiple separate problems
into a single complex interdependent solution.
In most cases this is a bad idea.
_______________________________________________
zones-discuss mailing list
zones-discuss@opensolaris.org

Reply via email to