Re: [zones-discuss] zoneadm hangs after repeated boot/halt use

2009-12-23 Thread Glenn Brunette


Frank,

Just verified that something is still wrong in b129, but the problem is
_not_ with a vanilla configuration.  This time around boot/halt #102,
the system apparently shutdown/panic'ed?  I was running it overnight
and came in to a system that had been rebooted.  I did not see any
problem in the audit log nor in /var/adm/messages.  Any pointers?

I am running an Immutable Service Container configuration, based upon
the installation steps at:

http://kenai.com/projects/isc/pages/OpenSolaris

Specifically:

pfexec pkg install SUNWmercurial
hg clone https://kenai.com/hg/isc~source  isc
pfexec isc/bin/iscadm.ksh -N 0
pfexec bootadm update-archive
pfexec shutdown -g 0 -i 0 -y
[after reboot]
zlogin -C isc1
[wait for zone isc1 to fully complete boot process]

then run the script that I provided that stops and starts the zone.

Apparently, there must be something wrong with the interaction of
components.  In this configuration, we have things like resource
controls, auditing, IP Filter/IP NAT, and zones all enabled.

Would it be possible for you to try the steps above on a fresh
install of 2009.06 or later (b129 is where I am right now).  Also,
if you have other debugging methods, please let me know.

I am going to kick this off again to see if I can catch any
error messages.

g


On 12/16/09 3:49 AM, Frank Batschulat (Home) wrote:

Glenn, I've not been able to reproduce this on onnv build 126 (it's running for 
a day now)

if that script would reproduce 6894901 straight away it should be doing so
on 126 as well (similar to what you've seen in 127)

this pose the question if there are either some other details in your
environment that I don't have or if that script really reliably reproduces 
6894901

cheers
frankB

On Tue, 15 Dec 2009 15:23:06 +0100, Frank Batschulat 
(Home)frank.batschu...@sun.com  wrote:


Glenn, I've been running this test case now for nearly a day on build 129, 
could'nt
reproduce at all. good chance this being indeed fixed by 6894901 in build 128.

I'll also try to reproduce this now on buil 126.

cheers
frankB

On Fri, 11 Dec 2009 21:48:52 +0100, Glenn Brunetteglenn.brune...@sun.com  
wrote:


As part of some Immutable Service Container[1] demonstration that I am
creating for an event in January.  I have the need to start/stop a zone
quite a few times (as part of a Self-Cleansing[2] demo).  During the
course of my testing, I have been able to repeatedly get zoneadm to
hang.

Since I am working with a highly customized configuration, I started
over with a default zone on OpenSolaris (b127) and was able to repeat
this issue.  To reproduce this problem use the following script after
creating a zone usual the normal/default steps:

isc...@osol-isc:~$ while : ; do
echo `date`: ZONE BOOT
pfexec zoneadm -z test boot
sleep 30
pfexec zoneamd -z test halt
echo `date`: ZONE HALT
sleep 10
done

This script works just fine for a while, but eventually zoneadm hangs
(was at pass #90 in my last test).  When this happens, zoneadm is shown
to be consuming quite a bit of CPU:

 PID USERNAME  SIZE   RSS STATE  PRI NICE  TIME  CPU PROCESS/NLWP

   16598 root   11M 3140K run  10   0:54:49  74% zoneadm/1


A stack trace of zoneadm shows:

isc...@osol-isc:~$ pfexec pstack `pgrep zoneadm`
16082:  zoneadmd -z test
-  lwp# 1  
-  lwp# 2  
   feef41c6 door (0, 0, 0, 0, 0, 8)
   feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67
   f3f3 _thrp_setup (fe5b0a00) + 9b
   f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0)
-  lwp# 3  
   feef420f __door_return () + 2f
-  lwp# 4  
   feef420f door (0, 0, 0, fe140e00, f5f00, a)
   feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f
   f3f3 _thrp_setup (fe5b1a00) + 9b
   f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0)
16598:  zoneadm -z test boot
   feef3fc8 door (6, 80476d0, 0, 0, 0, 3)
   feede653 door_call (6, 80476d0, 400, fe3d43f7) + 7b
   fe3d44f0 zonecfg_call_zoneadmd (8047e33, 8047730, 8078448, 1) + 124
   0805792d boot_func (0, 8047d74, 100, 805ff0b) + 1cd
   08060125 main (4, 8047d64, 8047d78, 805570f) + 2b9
   0805576d _start   (4, 8047e28, 8047e30, 8047e33, 8047e38, 0) + 7d


A stack trace of zoneadmd shows:

isc...@osol-isc:~$ pfexec pstack `pgrep zoneadmd`
16082:  zoneadmd -z test
-  lwp# 1  
-  lwp# 2  
   feef41c6 door (0, 0, 0, 0, 0, 8)
   feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67
   f3f3 _thrp_setup (fe5b0a00) + 9b
   f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0)
-  lwp# 3  
   feef4147 __door_ucred (80a37c8, fef81000, fe23e838, feed9cfe) + 27
   feed9d0d door_ucred (fe23f870, 1000, 0, 0) + 32
   

Re: [zones-discuss] zoneadm hangs after repeated boot/halt use

2009-12-23 Thread Glenn Brunette


That's the thing, I did not see any!?  I am still running my test.
It is up to iteration #98 right now.  I have verified via dumpadm
that a dump device is configured/enabled, so it is a bit of a
waiting game at this point.  I did extend the delay between boot
and halt just a bit to more accurately reflect my original crontab
but I would doubt that should change anything.  We shall see...

g


On 12/23/09 1:57 PM, Steve Lawrence wrote:

Do you have the panic message or crash dump?

-Steve L.


On Wed, Dec 23, 2009 at 09:26:17AM -0500, Glenn Brunette wrote:


Frank,

Just verified that something is still wrong in b129, but the problem is
_not_ with a vanilla configuration.  This time around boot/halt #102,
the system apparently shutdown/panic'ed?  I was running it overnight
and came in to a system that had been rebooted.  I did not see any
problem in the audit log nor in /var/adm/messages.  Any pointers?

I am running an Immutable Service Container configuration, based upon
the installation steps at:

http://kenai.com/projects/isc/pages/OpenSolaris

Specifically:

pfexec pkg install SUNWmercurial
hg clone https://kenai.com/hg/isc~source  isc
pfexec isc/bin/iscadm.ksh -N 0
pfexec bootadm update-archive
pfexec shutdown -g 0 -i 0 -y
[after reboot]
zlogin -C isc1
[wait for zone isc1 to fully complete boot process]

then run the script that I provided that stops and starts the zone.

Apparently, there must be something wrong with the interaction of
components.  In this configuration, we have things like resource
controls, auditing, IP Filter/IP NAT, and zones all enabled.

Would it be possible for you to try the steps above on a fresh
install of 2009.06 or later (b129 is where I am right now).  Also,
if you have other debugging methods, please let me know.

I am going to kick this off again to see if I can catch any
error messages.

g


On 12/16/09 3:49 AM, Frank Batschulat (Home) wrote:

Glenn, I've not been able to reproduce this on onnv build 126 (it's running for 
a day now)

if that script would reproduce 6894901 straight away it should be doing so
on 126 as well (similar to what you've seen in 127)

this pose the question if there are either some other details in your
environment that I don't have or if that script really reliably reproduces 
6894901

cheers
frankB

On Tue, 15 Dec 2009 15:23:06 +0100, Frank Batschulat 
(Home)frank.batschu...@sun.com   wrote:


Glenn, I've been running this test case now for nearly a day on build 129, 
could'nt
reproduce at all. good chance this being indeed fixed by 6894901 in build 128.

I'll also try to reproduce this now on buil 126.

cheers
frankB

On Fri, 11 Dec 2009 21:48:52 +0100, Glenn Brunetteglenn.brune...@sun.com   
wrote:


As part of some Immutable Service Container[1] demonstration that I am
creating for an event in January.  I have the need to start/stop a zone
quite a few times (as part of a Self-Cleansing[2] demo).  During the
course of my testing, I have been able to repeatedly get zoneadm to
hang.

Since I am working with a highly customized configuration, I started
over with a default zone on OpenSolaris (b127) and was able to repeat
this issue.  To reproduce this problem use the following script after
creating a zone usual the normal/default steps:

isc...@osol-isc:~$ while : ; do
  echo `date`: ZONE BOOT
  pfexec zoneadm -z test boot
  sleep 30
  pfexec zoneamd -z test halt
  echo `date`: ZONE HALT
  sleep 10
  done

This script works just fine for a while, but eventually zoneadm hangs
(was at pass #90 in my last test).  When this happens, zoneadm is shown
to be consuming quite a bit of CPU:

  PID USERNAME  SIZE   RSS STATE  PRI NICE  TIME  CPU PROCESS/NLWP

16598 root   11M 3140K run  10   0:54:49  74% zoneadm/1


A stack trace of zoneadm shows:

isc...@osol-isc:~$ pfexec pstack `pgrep zoneadm`
16082:  zoneadmd -z test
-  lwp# 1  
-  lwp# 2  
feef41c6 door (0, 0, 0, 0, 0, 8)
feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67
f3f3 _thrp_setup (fe5b0a00) + 9b
f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0)
-  lwp# 3  
feef420f __door_return () + 2f
-  lwp# 4  
feef420f door (0, 0, 0, fe140e00, f5f00, a)
feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f
f3f3 _thrp_setup (fe5b1a00) + 9b
f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0)
16598:  zoneadm -z test boot
feef3fc8 door (6, 80476d0, 0, 0, 0, 3)
feede653 door_call (6, 80476d0, 400, fe3d43f7) + 7b
fe3d44f0 zonecfg_call_zoneadmd (8047e33, 8047730, 8078448, 1) + 124
0805792d boot_func (0, 8047d74, 100, 805ff0b) + 1cd
08060125 main (4, 8047d64, 8047d78, 805570f) + 2b9
0805576d _start   (4, 8047e28, 8047e30, 8047e33, 8047e38, 0) + 7d


A stack trace

Re: [zones-discuss] zoneadm hangs after repeated boot/halt use

2009-12-22 Thread Glenn Brunette


Frank,

I am back from vacation and will be doing some additional testing.  I
have upgraded to b129 to see if the problem persists.  I have first
created a basic (generic) zone to see how it behaves.  If ok, I will
apply the Immutable Service Container construction kit to see if there
is any change.  The ISC Toolkit enables things like resource controls,
auditing, etc. which may influence the results I suppose - which is
why I am starting with a vanilla system.  Keep you posted.  Thanks for
your hard work looking into this!

g


On 12/16/09 3:49 AM, Frank Batschulat (Home) wrote:

Glenn, I've not been able to reproduce this on onnv build 126 (it's running for 
a day now)

if that script would reproduce 6894901 straight away it should be doing so
on 126 as well (similar to what you've seen in 127)

this pose the question if there are either some other details in your
environment that I don't have or if that script really reliably reproduces 
6894901

cheers
frankB

On Tue, 15 Dec 2009 15:23:06 +0100, Frank Batschulat 
(Home)frank.batschu...@sun.com  wrote:


Glenn, I've been running this test case now for nearly a day on build 129, 
could'nt
reproduce at all. good chance this being indeed fixed by 6894901 in build 128.

I'll also try to reproduce this now on buil 126.

cheers
frankB

On Fri, 11 Dec 2009 21:48:52 +0100, Glenn Brunetteglenn.brune...@sun.com  
wrote:


As part of some Immutable Service Container[1] demonstration that I am
creating for an event in January.  I have the need to start/stop a zone
quite a few times (as part of a Self-Cleansing[2] demo).  During the
course of my testing, I have been able to repeatedly get zoneadm to
hang.

Since I am working with a highly customized configuration, I started
over with a default zone on OpenSolaris (b127) and was able to repeat
this issue.  To reproduce this problem use the following script after
creating a zone usual the normal/default steps:

isc...@osol-isc:~$ while : ; do
echo `date`: ZONE BOOT
pfexec zoneadm -z test boot
sleep 30
pfexec zoneamd -z test halt
echo `date`: ZONE HALT
sleep 10
done

This script works just fine for a while, but eventually zoneadm hangs
(was at pass #90 in my last test).  When this happens, zoneadm is shown
to be consuming quite a bit of CPU:

 PID USERNAME  SIZE   RSS STATE  PRI NICE  TIME  CPU PROCESS/NLWP

   16598 root   11M 3140K run  10   0:54:49  74% zoneadm/1


A stack trace of zoneadm shows:

isc...@osol-isc:~$ pfexec pstack `pgrep zoneadm`
16082:  zoneadmd -z test
-  lwp# 1  
-  lwp# 2  
   feef41c6 door (0, 0, 0, 0, 0, 8)
   feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67
   f3f3 _thrp_setup (fe5b0a00) + 9b
   f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0)
-  lwp# 3  
   feef420f __door_return () + 2f
-  lwp# 4  
   feef420f door (0, 0, 0, fe140e00, f5f00, a)
   feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f
   f3f3 _thrp_setup (fe5b1a00) + 9b
   f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0)
16598:  zoneadm -z test boot
   feef3fc8 door (6, 80476d0, 0, 0, 0, 3)
   feede653 door_call (6, 80476d0, 400, fe3d43f7) + 7b
   fe3d44f0 zonecfg_call_zoneadmd (8047e33, 8047730, 8078448, 1) + 124
   0805792d boot_func (0, 8047d74, 100, 805ff0b) + 1cd
   08060125 main (4, 8047d64, 8047d78, 805570f) + 2b9
   0805576d _start   (4, 8047e28, 8047e30, 8047e33, 8047e38, 0) + 7d


A stack trace of zoneadmd shows:

isc...@osol-isc:~$ pfexec pstack `pgrep zoneadmd`
16082:  zoneadmd -z test
-  lwp# 1  
-  lwp# 2  
   feef41c6 door (0, 0, 0, 0, 0, 8)
   feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67
   f3f3 _thrp_setup (fe5b0a00) + 9b
   f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0)
-  lwp# 3  
   feef4147 __door_ucred (80a37c8, fef81000, fe23e838, feed9cfe) + 27
   feed9d0d door_ucred (fe23f870, 1000, 0, 0) + 32
   08058a88 server   (0, fe23f8f0, 510, 0, 0, 8058a04) + 84
   feef4240 __door_return () + 60
-  lwp# 4  
   feef420f door (0, 0, 0, fe140e00, f5f00, a)
   feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f
   f3f3 _thrp_setup (fe5b1a00) + 9b
   f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0)


A truss of zoneadm (-f -vall -wall -tall) shows this looping:

16598:  door_call(6, 0x080476D0)= 0
16598:  data_ptr=8047730 data_size=0
16598:  desc_ptr=0x0 desc_num=0
16598:  rbuf=0x807F2D8 rsize=4096
16598:  close(6)= 0
16598:  mkdir(/var/run/zones, 0700)   Err#17 EEXIST
16598:  

[zones-discuss] zoneadm hangs after repeated boot/halt use

2009-12-11 Thread Glenn Brunette


As part of some Immutable Service Container[1] demonstration that I am
creating for an event in January.  I have the need to start/stop a zone
quite a few times (as part of a Self-Cleansing[2] demo).  During the
course of my testing, I have been able to repeatedly get zoneadm to
hang.

Since I am working with a highly customized configuration, I started
over with a default zone on OpenSolaris (b127) and was able to repeat
this issue.  To reproduce this problem use the following script after
creating a zone usual the normal/default steps:

isc...@osol-isc:~$ while : ; do
 echo `date`: ZONE BOOT
 pfexec zoneadm -z test boot
 sleep 30
 pfexec zoneamd -z test halt
 echo `date`: ZONE HALT
 sleep 10
 done

This script works just fine for a while, but eventually zoneadm hangs
(was at pass #90 in my last test).  When this happens, zoneadm is shown
to be consuming quite a bit of CPU:

   PID USERNAME  SIZE   RSS STATE  PRI NICE  TIME  CPU PROCESS/NLWP 


 16598 root   11M 3140K run  10   0:54:49  74% zoneadm/1


A stack trace of zoneadm shows:

isc...@osol-isc:~$ pfexec pstack `pgrep zoneadm`
16082:  zoneadmd -z test
-  lwp# 1  
-  lwp# 2  
 feef41c6 door (0, 0, 0, 0, 0, 8)
 feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67
 f3f3 _thrp_setup (fe5b0a00) + 9b
 f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0)
-  lwp# 3  
 feef420f __door_return () + 2f
-  lwp# 4  
 feef420f door (0, 0, 0, fe140e00, f5f00, a)
 feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f
 f3f3 _thrp_setup (fe5b1a00) + 9b
 f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0)
16598:  zoneadm -z test boot
 feef3fc8 door (6, 80476d0, 0, 0, 0, 3)
 feede653 door_call (6, 80476d0, 400, fe3d43f7) + 7b
 fe3d44f0 zonecfg_call_zoneadmd (8047e33, 8047730, 8078448, 1) + 124
 0805792d boot_func (0, 8047d74, 100, 805ff0b) + 1cd
 08060125 main (4, 8047d64, 8047d78, 805570f) + 2b9
 0805576d _start   (4, 8047e28, 8047e30, 8047e33, 8047e38, 0) + 7d


A stack trace of zoneadmd shows:

isc...@osol-isc:~$ pfexec pstack `pgrep zoneadmd`
16082:  zoneadmd -z test
-  lwp# 1  
-  lwp# 2  
 feef41c6 door (0, 0, 0, 0, 0, 8)
 feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67
 f3f3 _thrp_setup (fe5b0a00) + 9b
 f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0)
-  lwp# 3  
 feef4147 __door_ucred (80a37c8, fef81000, fe23e838, feed9cfe) + 27
 feed9d0d door_ucred (fe23f870, 1000, 0, 0) + 32
 08058a88 server   (0, fe23f8f0, 510, 0, 0, 8058a04) + 84
 feef4240 __door_return () + 60
-  lwp# 4  
 feef420f door (0, 0, 0, fe140e00, f5f00, a)
 feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f
 f3f3 _thrp_setup (fe5b1a00) + 9b
 f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0)


A truss of zoneadm (-f -vall -wall -tall) shows this looping:

16598:  door_call(6, 0x080476D0)= 0
16598:  data_ptr=8047730 data_size=0
16598:  desc_ptr=0x0 desc_num=0
16598:  rbuf=0x807F2D8 rsize=4096
16598:  close(6)= 0
16598:  mkdir(/var/run/zones, 0700)   Err#17 EEXIST
16598:  chmod(/var/run/zones, 0700)   = 0
16598:  open(/var/run/zones/test.zoneadm.lock, O_RDWR|O_CREAT, 0600) = 6
16598:  fcntl(6, F_SETLKW, 0x08046DC0)  = 0
16598:  typ=F_WRLCK  whence=SEEK_SET start=0 len=0 
sys=4277003009 pid=6

16598:  open(/var/run/zones/test.zoneadmd_door, O_RDONLY) = 7
16598:  door_info(7, 0x08047230)= 0
16598:  target=16082 proc=0x8058A04 data=0x0
16598:  attributes=DOOR_UNREF|DOOR_REFUSE_DESC|DOOR_NO_CANCEL
16598:  uniquifier=26426
16598:  close(7)= 0
16598:  close(6)= 0
16598:  open(/var/run/zones/test.zoneadmd_door, O_RDONLY) = 6
16082/3:door_return(0x, 0, 0x, 0xFE23FE00, 
1007360) = 0

16082/3:door_ucred(0x080A37C8)  = 0
16082/3:euid=0 egid=0
16082/3:ruid=0 rgid=0
16082/3:pid=16598 zoneid=0
16082/3:E: all
16082/3:I: basic
16082/3:P: all
16082/3:L: all


PID 16598 is zoneadm and PID 16082 is zoneadmd.


Is this a known issue?  Are there any other things that I can do to
help debug this situation?  Once things get into this state, I have
only been able to recover by rebooting the zone.



Please advise.

g


[1] http://kenai.com/projects/isc/pages/OpenSolaris
[2] 

Re: [zones-discuss] zoneadm hangs after repeated boot/halt use

2009-12-11 Thread Glenn Brunette


Steve,

Thanks for the ptr.  Will give it a try!

g


On 12/11/09 5:59 PM, Steve Lawrence wrote:

Looks a lot like 6894901.  Can you try build 128?

-Steve

On Fri, Dec 11, 2009 at 03:48:52PM -0500, Glenn Brunette wrote:


As part of some Immutable Service Container[1] demonstration that I am
creating for an event in January.  I have the need to start/stop a zone
quite a few times (as part of a Self-Cleansing[2] demo).  During the
course of my testing, I have been able to repeatedly get zoneadm to
hang.

Since I am working with a highly customized configuration, I started
over with a default zone on OpenSolaris (b127) and was able to repeat
this issue.  To reproduce this problem use the following script after
creating a zone usual the normal/default steps:

isc...@osol-isc:~$ while : ; do

echo `date`: ZONE BOOT
pfexec zoneadm -z test boot
sleep 30
pfexec zoneamd -z test halt
echo `date`: ZONE HALT
sleep 10
done


This script works just fine for a while, but eventually zoneadm hangs
(was at pass #90 in my last test).  When this happens, zoneadm is shown
to be consuming quite a bit of CPU:

PID USERNAME  SIZE   RSS STATE  PRI NICE  TIME  CPU PROCESS/NLWP

  16598 root   11M 3140K run  10   0:54:49  74% zoneadm/1


A stack trace of zoneadm shows:

isc...@osol-isc:~$ pfexec pstack `pgrep zoneadm`
16082:  zoneadmd -z test
-  lwp# 1  
-  lwp# 2  
  feef41c6 door (0, 0, 0, 0, 0, 8)
  feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67
  f3f3 _thrp_setup (fe5b0a00) + 9b
  f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0)
-  lwp# 3  
  feef420f __door_return () + 2f
-  lwp# 4  
  feef420f door (0, 0, 0, fe140e00, f5f00, a)
  feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f
  f3f3 _thrp_setup (fe5b1a00) + 9b
  f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0)
16598:  zoneadm -z test boot
  feef3fc8 door (6, 80476d0, 0, 0, 0, 3)
  feede653 door_call (6, 80476d0, 400, fe3d43f7) + 7b
  fe3d44f0 zonecfg_call_zoneadmd (8047e33, 8047730, 8078448, 1) + 124
  0805792d boot_func (0, 8047d74, 100, 805ff0b) + 1cd
  08060125 main (4, 8047d64, 8047d78, 805570f) + 2b9
  0805576d _start   (4, 8047e28, 8047e30, 8047e33, 8047e38, 0) + 7d


A stack trace of zoneadmd shows:

isc...@osol-isc:~$ pfexec pstack `pgrep zoneadmd`
16082:  zoneadmd -z test
-  lwp# 1  
-  lwp# 2  
  feef41c6 door (0, 0, 0, 0, 0, 8)
  feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67
  f3f3 _thrp_setup (fe5b0a00) + 9b
  f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0)
-  lwp# 3  
  feef4147 __door_ucred (80a37c8, fef81000, fe23e838, feed9cfe) + 27
  feed9d0d door_ucred (fe23f870, 1000, 0, 0) + 32
  08058a88 server   (0, fe23f8f0, 510, 0, 0, 8058a04) + 84
  feef4240 __door_return () + 60
-  lwp# 4  
  feef420f door (0, 0, 0, fe140e00, f5f00, a)
  feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f
  f3f3 _thrp_setup (fe5b1a00) + 9b
  f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0)


A truss of zoneadm (-f -vall -wall -tall) shows this looping:

16598:  door_call(6, 0x080476D0)= 0
16598:  data_ptr=8047730 data_size=0
16598:  desc_ptr=0x0 desc_num=0
16598:  rbuf=0x807F2D8 rsize=4096
16598:  close(6)= 0
16598:  mkdir(/var/run/zones, 0700)   Err#17 EEXIST
16598:  chmod(/var/run/zones, 0700)   = 0
16598:  open(/var/run/zones/test.zoneadm.lock, O_RDWR|O_CREAT, 0600) = 6
16598:  fcntl(6, F_SETLKW, 0x08046DC0)  = 0
16598:  typ=F_WRLCK  whence=SEEK_SET start=0 len=0
sys=4277003009 pid=6
16598:  open(/var/run/zones/test.zoneadmd_door, O_RDONLY) = 7
16598:  door_info(7, 0x08047230)= 0
16598:  target=16082 proc=0x8058A04 data=0x0
16598:  attributes=DOOR_UNREF|DOOR_REFUSE_DESC|DOOR_NO_CANCEL
16598:  uniquifier=26426
16598:  close(7)= 0
16598:  close(6)= 0
16598:  open(/var/run/zones/test.zoneadmd_door, O_RDONLY) = 6
16082/3:door_return(0x, 0, 0x, 0xFE23FE00,
1007360) = 0
16082/3:door_ucred(0x080A37C8)  = 0
16082/3:euid=0 egid=0
16082/3:ruid=0 rgid=0
16082/3:pid=16598 zoneid=0
16082/3:E: all
16082/3:I: basic
16082/3:P: all
16082/3:L: all


PID 16598 is zoneadm and PID 16082 is zoneadmd.


Is this a known issue?  Are there any other things that I can do

Re: [zones-discuss] Transfer from 'http://pkg.opensolaris.org/release' timed out: timed out.

2009-07-06 Thread Glenn Brunette


I am getting this on the dev tree as well (last 2 days) from an
OpenSolaris instance running under VBox 3.0.0 (on Mac OS X).

g

On 7/6/09 5:28 PM, Brian Leonard wrote:

Hi,

I'm just trying to raise awareness on this issue. Folks in the opensolaris-help 
forum are getting pretty upset because it has not been possible to install a 
zone for days. See:

http://opensolaris.org/jive/thread.jspa?threadID=107149tstart=15

A thread has been started over in opensolaris-discuss as well, but it doesn't 
seem to have any resolution yet:

http://www.opensolaris.org/jive/thread.jspa?threadID=107204tstart=0

Is there a known issue with http://pkg.opensolaris.org/release that's being 
worked on?

Thanks,
Brian

___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Zone Statistics: monitoring resource use of zones

2008-11-18 Thread Glenn Brunette

Mike,

Mike Gerdts wrote:
 On Mon, Nov 17, 2008 at 8:05 PM, Glenn Brunette [EMAIL PROTECTED] wrote:
 Jeff,

 This actually hits on a similar request that I have (but for different
 reasons).  I would like a stable interface from which I could tell
 the update revision of a system.
 
 This seems to be another case for feature-based meta packages.
 
 http://mgerdts.blogspot.com/2008/03/solaris-wish-list-feature-based-meta.html
 
 I describe it for the simplicity of installing software, but with a
 bit of thought it could be possible to extend it to this use as well.

Very similiar indeed.  While this may work with the big items that
qualify as features, I am not sure if this would be stretching the
metaphor a bit for smaller components, but you and I are definitely
thinking in a similar vein.

 In a past life working on JASS, we were told to not test for patch or
 update levels but rather to test whether a specific feature is present,
 and while I understand the merits of this methodology, it does not
 always provide a complete solution (without making significant
 assumptions about how the system was installed/maintained).  For
 
 As a very heavy user of JASS, this methodology is appreciated.  It has
 made the software continue to be quite useful long after Sun stopped
 providing updates.  (Any news on open sourcing it?)

Thank you!  As far as news on JASS, all that I can say for now is
stay tuned.  There is a lot of discussion happening on this front
these days, and I hope we will have some news to share soon.

g
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Zone Statistics: monitoring resource use of zones

2008-11-17 Thread Glenn Brunette

Jeff,

This actually hits on a similar request that I have (but for different
reasons).  I would like a stable interface from which I could tell
the update revision of a system.

I have a very large government customer who (as part of their security
configuration hardening and assessment) process have a very real need
to detect OS version and update levels so that they can determine which
actions/checks to apply.

In a past life working on JASS, we were told to not test for patch or
update levels but rather to test whether a specific feature is present,
and while I understand the merits of this methodology, it does not
always provide a complete solution (without making significant
assumptions about how the system was installed/maintained).  For
example, is the feature not present or has it been removed or simply
not installed?  Also, the existence of some features also can not be
easily tested using automated tools without imposing a great burden
on the tool developer.

It sounds like you may be in a similar boat.  What do you think?
Cross-posting to security-discuss to get their feedback as well.

g


Jeff Victor wrote:
 Hi Kevin,
 
 I believe that you cannot patch your way from U1 to U5 - i.e. that the
 system is missing some functionality that would be there if you had
 applied the updates - but your point is still valid. I will look into
 the correctness of using patch levels to detect feature availability.
 
 On Mon, Nov 17, 2008 at 6:09 PM, Young, Kevin [EMAIL PROTECTED] wrote:
 Jeff,
 I am wondering about the logic in how the script identifies specific
 versions. It appears that you are looking at /etc/release to define
 this.  This seems to limit some features of your script because I have a
 Solaris 10 update 1 system that has been updated to 05/08 (update 5) but
 /etc/release still reflects update 1 (updated using 05/08 patch bundle).

 I am using CPU caps but your tool doesn't recognize that I have that
 feature available. Since these features really come from the kernel
 version, would that be a better way to identify release version in your
 script; Just a thought.

 In the meantime I tricked the script to think I am on update 5 and I am
 getting better results.


 -= Kevin =-


 -Original Message-
 From: Jeff Victor [mailto:[EMAIL PROTECTED]
 Sent: Monday, November 10, 2008 9:01 AM
 To: Young, Kevin
 Cc: zones-discuss@opensolaris.org
 Subject: Re: [zones-discuss] Zone Statistics: monitoring resource use of
 zones

 On Mon, Nov 10, 2008 at 11:21 AM, Young, Kevin [EMAIL PROTECTED]
 wrote:
 I am curious if you have plans to make it Solaris 10 compatible.
 I do all development on Solaris 10. The script makes an effort to
 distinguish between the different capabilities of the different
 Solaris 10 updates.


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Jeff Victor
 Sent: Sunday, November 09, 2008 5:54 PM
 To: zones-discuss@opensolaris.org
 Subject: [zones-discuss] Zone Statistics: monitoring resource use of
 zones
 It has become clear that there is a need to monitor resource
 consumption of workloads in zones, and an easy method to compare
 consumption to resource controls. In order to understand how a
 software tool could fulfill this need, I created an OpenSolaris
 project and a prototype to get started. If this sounds interesting,
 you can find the project and Perl script at:
 http://opensolaris.org/os/project/zonestat/ .

 If you have any comments, or suggestions for improvement, please let
 me know on this e-mail list or via private e-mail.
 
 

-- 
Glenn Brunette
Distinguished Engineer
Director, GSS Security Office
Sun Microsystems, Inc.
___
zones-discuss mailing list
zones-discuss@opensolaris.org


[zones-discuss] swap memory cap and swap command output

2008-08-11 Thread Glenn Brunette

My apologies if this is a RTFM moment, but I have been looking and have
been unable to find an answer.  If memory caps are defined for a given
zone, why doesn't the output of swap display the cap?

# zonecfg -z web info capped-memory
capped-memory:
 physical: 1G
 [swap: 200M]
# zoneadm -z web boot
# zlogin web /usr/sbin/swap -lh
swapfile devswaplo   blocks free
/dev/swap - 4K 3.0G 2.4G

Similarly, the output of df shows no since restriction on tmpfs
filesystems:

# zlogin web df -h /tmp /var/run
Filesystem size   used  avail capacity  Mounted on
swap   2.2G36K   2.2G 1%/tmp
swap   2.2G16K   2.2G 1%/var/run

The only way that one can determine that a cap is in place is to run
into it (and get out of space errors) or use kstat:

# kstat -p caps:13:swapresv_zone_*:
caps:13:swapresv_zone_13:class  zone_caps
caps:13:swapresv_zone_13:crtime 1273429.36377325
caps:13:swapresv_zone_13:snaptime   1273515.79106759
caps:13:swapresv_zone_13:usage  94965760
caps:13:swapresv_zone_13:value  209715200
caps:13:swapresv_zone_13:zonename   web

Am I missing something or is there a more user friendly way to let
someone know that they are running into a memory cap?

g
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] swap memory cap and swap command output

2008-08-11 Thread Glenn Brunette

Thanks!

Jerry Jelinek wrote:
 Glenn Brunette wrote:
 My apologies if this is a RTFM moment, but I have been looking and have
 been unable to find an answer.  If memory caps are defined for a given
 zone, why doesn't the output of swap display the cap?
 
 6572077 size of swapfs filesystems in a zone should reflect zone.max-swap 
 limit
 
 Jerry
 ___
 zones-discuss mailing list
 zones-discuss@opensolaris.org

-- 
Glenn Brunette
Distinguished Engineer
Director, GSS Security Office
Sun Microsystems, Inc.
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] swap memory cap and swap command output

2008-08-11 Thread Glenn Brunette


Mike Gerdts wrote:
 On Mon, Aug 11, 2008 at 8:35 AM, Glenn Brunette [EMAIL PROTECTED] wrote:
 My apologies if this is a RTFM moment, but I have been looking and have
 been unable to find an answer.  If memory caps are defined for a given
 zone, why doesn't the output of swap display the cap?

 # zonecfg -z web info capped-memory
 capped-memory:
 physical: 1G
 [swap: 200M]
 
 As an aside, this configuration doesn't seem to make much sense.  It

It was not supposed to.  Just playing around with the settings to get a
feel for how they worked.  This was just one of the examples I had been
playing with.

g
___
zones-discuss mailing list
zones-discuss@opensolaris.org