[zones-discuss] zoneadm confusion

2008-07-09 Thread Martin Englund
I've got a zone on a Solaris 10 system which after a reboot from within the 
zone has ended up in a confused state. The zone never came back up again, and 
zoneadm list thinks it is down:

  ID NAME STATUS PATH   BRANDIP
   0 global   running/  native   shared
  60 jcp-mail-zn-mn-colo1 down   /zones/jcp-mail-zn-mn-colo1native   
shared
  63 jcp-web-zn-mn-colo1 running/zones/jcp-web-zn-mn-colo1 native   
shared

But at the same time, zoneadm boot thinks it is up:
[EMAIL PROTECTED] # zoneadm -z jcp-mail-zn-mn-colo1 boot
zoneadm: zone 'jcp-mail-zn-mn-colo1': zone is already booted

If I try to halt the zone, it just hangs:
[EMAIL PROTECTED] # zoneadm -z jcp-mail-zn-mn-colo1 halt
^C (after 10 minutes)

Any clues to what is going on?

cheers,
/Martin
 
 
This message posted from opensolaris.org
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] zoneadm confusion

2008-07-09 Thread Martin Englund
James,

On 9 jul 2008, at 09:32, James Carlson wrote:

 Martin Englund writes:
 But at the same time, zoneadm boot thinks it is up:
 [EMAIL PROTECTED] # zoneadm -z jcp-mail-zn-mn-colo1 boot
 zoneadm: zone 'jcp-mail-zn-mn-colo1': zone is already booted

 If I try to halt the zone, it just hangs:
 [EMAIL PROTECTED] # zoneadm -z jcp-mail-zn-mn-colo1 halt
 ^C (after 10 minutes)

 Any clues to what is going on?

 Is there any chance that it's stuck trying to shut down?  I'd first
 look for threads that appear to be stuck in mdb's ::threadlist -v
 output.


it turned out to be two zoneadmd processes for that zone, and after  
removing them I could get things back to normal with zoneadm -z zone  
boot.

Thanks for the pointer!

cheers,
/Martin
-- 
Martin Englund, Security Engineer, .Sun Engineering, Sun Microsystems  
Inc.
Email: [EMAIL PROTECTED] Time Zone: GMT-3 PGP: 1024D/AA514677
The question is not if you are paranoid, it is if you are paranoid  
enough.


___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] zoneadm confusion

2008-07-09 Thread Martin Englund
Konstantin,

here's the info you requested:

[EMAIL PROTECTED] # pstack $(pgrep -f 'zoneadm.*jcp-mail-zn-mn-colo1')
21276:  zoneadmd -z jcp-mail-zn-mn-colo1
-  lwp# 1 / thread# 1  
  fef05405 pollsys  (8046cf0, 4, 0, 0)
  feeb6592 poll (8046cf0, 4, ) + 52
  0805a0ce do_console_io (8077e78, 4, 2) + 76
  0805a36c serve_console (8077e78) + 78
  080590ba main (3, 8047df4, 8047e04) + 512
  08056b7a _start   (3, 8047eb0, 8047eb9, 8047ebc, 0, 8047ed1) + 7a
-  lwp# 2 / thread# 2  
  fef06266 door (0, 0, 0, 0, 0, 8)
  feeef0a3 door_unref_func (531c) + 43
  fef01112 _thr_setup (fec60a00) + 52
  fef01370 _lwp_start (fec60a00, 0, 0, 0, 0, 0)
-  lwp# 3 / thread# 3  
  fef062af door (fe74e850, 1000, 0, fe74fe00, f5f00, a)
  0805876b server   (0, fe74f8f0, 510, 0, 0, 8057edc) + 88f
  fef062e0 __door_return () + 60
-  lwp# 4 / thread# 4  
  fef062af door (0, 0, 0, fe650e00, f5f00, a)
  feeef5cd door_create_func (0) + 2c
  fef01112 _thr_setup (fec61a00) + 52
  fef01370 _lwp_start (fec61a00, 0, 0, 0, 0, 0)

---

[EMAIL PROTECTED] # zoneadm list -cv
   ID NAME STATUS PATH
BRANDIP
0 global   running/   
native   shared
7 javafx-zn-mn-colo1 running/zones/javafx-zn-mn-colo1   
native   shared
8 egc-zn-mn-colo1  running/zones/egc-zn-mn-colo1  
native   shared
   60 jcp-mail-zn-mn-colo1 down   /zones/jcp-mail-zn-mn-colo1 
native   shared
   63 jcp-web-zn-mn-colo1 running/zones/jcp-web-zn-mn-colo1  
native   shared
- master   installed  /zones/master   
native   shared

---

[EMAIL PROTECTED] # ps -fz  jcp-mail-zn-mn-colo1
  UID   PID  PPID   CSTIME TTY TIME CMD

---

[EMAIL PROTECTED] # df -hZ |grep /zones/jcp-mail-zn-mn-colo1
z1/zones/jcp-mail-zn-mn-colo1   248G   619M   229G 1%/zones/ 
jcp-mail-zn-mn-colo1

---

[EMAIL PROTECTED] # zoneadm -z jcp-mail-zn-mn-colo1 boot
zoneadm: zone 'jcp-mail-zn-mn-colo1': zone is already booted

cheers,
/Martin
-- 
Martin Englund, Security Engineer, .Sun Engineering, Sun Microsystems  
Inc.
Email: [EMAIL PROTECTED] Time Zone: GMT-3 PGP: 1024D/AA514677
The question is not if you are paranoid, it is if you are paranoid  
enough.


___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] zoneadm confusion

2008-07-09 Thread Martin Englund

On 9 jul 2008, at 21:42, Konstantin Gremliza wrote:

 I wrote a dtrace script to monitor zone states. The resulting  
 automata is in the attachment.
 In the state down, we should still have a zsched (ps), but we should  
 not have any mounted filesystems, which is true (df).

There is no zsched running in that zone and nothing is mounted either  
(apart from the zone root).

 Its quite correct that you cannot say
 zoneadm -z xxx boot
 when the zone is still down.
 Did you ever try
 zoneadm -z xxx halt
 ? If not, please do (this is my first tip).


I had tried that, and it hung.

Now that I tried the halt + boot again, it worked. I guess James  
suggestion might have been true, i.e. a stuck kernel thread.

This is not the first time it has happened, so when/if it happens  
again I'll know where to look.

 I see you have the root of the zone on zfs. I guess you know, that  
 this is not supported.
 Did you replicated the zones with zfs clone?


It is running on Nevada not Solaris 10, and as far as I know it is  
supported there :)

Thanks for your help and the zone automat picture.

cheers,
/Martin
-- 
Martin Englund, Security Engineer, .Sun Engineering, Sun Microsystems  
Inc.
Email: [EMAIL PROTECTED] Time Zone: GMT-3 PGP: 1024D/AA514677
The question is not if you are paranoid, it is if you are paranoid  
enough.


___
zones-discuss mailing list
zones-discuss@opensolaris.org