Re: [zones-discuss] zoneadm confusion

2008-07-10 Thread Konstantin Gremliza




Thank you Dan! Also for the good hint to look at zone.c. I tried to add
some of the information into my picture.

I moved it to http://gremliza.net  for anyone who likes it.

Regards, 

Konstantin

This was my good old zone_state.d dtrace script:

#!/usr/sbin/dtrace -qs

BEGIN {
    z[0]="UNINITIALIZED";
    z[1]="READY";
    z[2]="BOOTING";
    z[3]="RUNNING";
    z[4]="SHUTTDING_DOWN";
    z[5]="EMPTY";
    z[6]="DOWN";
    z[7]="DYING";
    z[8]="DEAD";
    st=timestamp;
    printf("%14s %12s %3s %s\n", "TIME", "ZONE", "ID", "STATE");
}

proc:::exec
/ execname == "zoneadmd" / {
    self->exec = execname;
}

proc:::exec-success
/ self->exec == "zoneadmd" / {
    t=timestamp-st;
    printf("%10d.%03d", t/100, (t/1000)%1000 );
    printf(" EXEC: %s\n", curpsinfo->pr_psargs) ;
    self->exec = 0;
}

proc:::create
/ args[0]->pr_fname == "zsched" || args[0]->pr_fname == "init" / {
    t=timestamp-st;
    printf("%10d.%03d", t/100, (t/1000)%1000 );
    printf(" FORK: %3d %s[%d]\n", args[0]->pr_zoneid,
args[0]->pr_fname, args[0]->pr_pid) ;
}

proc:::exit
/ execname == "zsched" || execname == "init" / {
    t=timestamp-st;
    printf("%10d.%03d", t/100, (t/1000)%1000 );
    printf(" EXIT: %3d %s[%d]\n", curpsinfo->pr_zoneid,
execname, pid) ;
}


fbt:genunix:zone_status_set:entry {
    t=timestamp-st;
    printf("%10d.%03d", t/100, (t/1000)%1000 );
    printf("   %3d %12s %s\n", args[0]->zone_id,
stringof(args[0]->zone_name), z[args[1]]);
}



Dan Price schrieb:

  On Thu 10 Jul 2008 at 02:42AM, Konstantin Gremliza wrote:
  
  
   Ok.

   I wrote a dtrace script to monitor zone states. The resulting automata is
   in the attachment.
   In the state down, we should still have a zsched (ps), but we should not
   have any mounted filesystems, which is true (df).

  
  
Nicely done!  This is very well organized.

There's more info on the different states in the big comment at
the top of:

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/zone.c

-dp

  




___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] zoneadm confusion

2008-07-09 Thread Dan Price
On Thu 10 Jul 2008 at 02:42AM, Konstantin Gremliza wrote:
>Ok.
> 
>I wrote a dtrace script to monitor zone states. The resulting automata is
>in the attachment.
>In the state down, we should still have a zsched (ps), but we should not
>have any mounted filesystems, which is true (df).

Nicely done!  This is very well organized.

There's more info on the different states in the big comment at
the top of:

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/zone.c

-dp

-- 
Daniel Price - Solaris Kernel Engineering - [EMAIL PROTECTED] - blogs.sun.com/dp
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] zoneadm confusion

2008-07-09 Thread Martin Englund

On 9 jul 2008, at 21:42, Konstantin Gremliza wrote:

> I wrote a dtrace script to monitor zone states. The resulting  
> automata is in the attachment.
> In the state down, we should still have a zsched (ps), but we should  
> not have any mounted filesystems, which is true (df).
>
There is no zsched running in that zone and nothing is mounted either  
(apart from the zone root).

> Its quite correct that you cannot say
> zoneadm -z xxx boot
> when the zone is still down.
> Did you ever try
> zoneadm -z xxx halt
> ? If not, please do (this is my first tip).
>

I had tried that, and it hung.

Now that I tried the halt + boot again, it worked. I guess James  
suggestion might have been true, i.e. a stuck kernel thread.

This is not the first time it has happened, so when/if it happens  
again I'll know where to look.

> I see you have the root of the zone on zfs. I guess you know, that  
> this is not supported.
> Did you replicated the zones with zfs clone?
>

It is running on Nevada not Solaris 10, and as far as I know it is  
supported there :)

Thanks for your help and the zone automat picture.

cheers,
/Martin
-- 
Martin Englund, Security Engineer, .Sun Engineering, Sun Microsystems  
Inc.
Email: [EMAIL PROTECTED] Time Zone: GMT-3 PGP: 1024D/AA514677
"The question is not if you are paranoid, it is if you are paranoid  
enough."


___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] zoneadm confusion

2008-07-09 Thread Martin Englund
Konstantin,

here's the info you requested:

[EMAIL PROTECTED] # pstack $(pgrep -f 'zoneadm.*jcp-mail-zn-mn-colo1')
21276:  zoneadmd -z jcp-mail-zn-mn-colo1
-  lwp# 1 / thread# 1  
  fef05405 pollsys  (8046cf0, 4, 0, 0)
  feeb6592 poll (8046cf0, 4, ) + 52
  0805a0ce do_console_io (8077e78, 4, 2) + 76
  0805a36c serve_console (8077e78) + 78
  080590ba main (3, 8047df4, 8047e04) + 512
  08056b7a _start   (3, 8047eb0, 8047eb9, 8047ebc, 0, 8047ed1) + 7a
-  lwp# 2 / thread# 2  
  fef06266 door (0, 0, 0, 0, 0, 8)
  feeef0a3 door_unref_func (531c) + 43
  fef01112 _thr_setup (fec60a00) + 52
  fef01370 _lwp_start (fec60a00, 0, 0, 0, 0, 0)
-  lwp# 3 / thread# 3  
  fef062af door (fe74e850, 1000, 0, fe74fe00, f5f00, a)
  0805876b server   (0, fe74f8f0, 510, 0, 0, 8057edc) + 88f
  fef062e0 __door_return () + 60
-  lwp# 4 / thread# 4  
  fef062af door (0, 0, 0, fe650e00, f5f00, a)
  feeef5cd door_create_func (0) + 2c
  fef01112 _thr_setup (fec61a00) + 52
  fef01370 _lwp_start (fec61a00, 0, 0, 0, 0, 0)

---

[EMAIL PROTECTED] # zoneadm list -cv
   ID NAME STATUS PATH
BRANDIP
0 global   running/   
native   shared
7 javafx-zn-mn-colo1 running/zones/javafx-zn-mn-colo1   
native   shared
8 egc-zn-mn-colo1  running/zones/egc-zn-mn-colo1  
native   shared
   60 jcp-mail-zn-mn-colo1 down   /zones/jcp-mail-zn-mn-colo1 
native   shared
   63 jcp-web-zn-mn-colo1 running/zones/jcp-web-zn-mn-colo1  
native   shared
- master   installed  /zones/master   
native   shared

---

[EMAIL PROTECTED] # ps -fz  jcp-mail-zn-mn-colo1
  UID   PID  PPID   CSTIME TTY TIME CMD

---

[EMAIL PROTECTED] # df -hZ |grep /zones/jcp-mail-zn-mn-colo1
z1/zones/jcp-mail-zn-mn-colo1   248G   619M   229G 1%/zones/ 
jcp-mail-zn-mn-colo1

---

[EMAIL PROTECTED] # zoneadm -z jcp-mail-zn-mn-colo1 boot
zoneadm: zone 'jcp-mail-zn-mn-colo1': zone is already booted

cheers,
/Martin
-- 
Martin Englund, Security Engineer, .Sun Engineering, Sun Microsystems  
Inc.
Email: [EMAIL PROTECTED] Time Zone: GMT-3 PGP: 1024D/AA514677
"The question is not if you are paranoid, it is if you are paranoid  
enough."


___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] zoneadm confusion

2008-07-09 Thread James Carlson
Martin Englund writes:
> I get over 10,000 lines of output from "::threadlist -v" any hints
> on how to find the needle in the haystack? :)

I usually do "::threadlist -v ! less" and then search around in the
output.

The first suspects are zoneadmd and any longish-looking stacks.

(There are more systematic ways to search for the offender, including
locating the zone_t and finding out what it's blocked on, but looking
at the stacks is often effective and quick.)

-- 
James Carlson, Solaris Networking  <[EMAIL PROTECTED]>
Sun Microsystems / 35 Network Drive71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] zoneadm confusion

2008-07-09 Thread Konstantin Gremliza
Martin Englund schrieb:
> I get over 10,000 lines of output from "::threadlist -v" any hints on how to 
> find the needle in the haystack? :)
>
> cheers,
> /Martin
>  
>  
> This message posted from opensolaris.org
> ___
> zones-discuss mailing list
> zones-discuss@opensolaris.org
>
> __ Information from ESET NOD32 Antivirus, version of virus signature 
> database 3255 (20080709) __
>
> The message was checked by ESET NOD32 Antivirus.
>
> http://www.eset.com
>
>
>
>   
Hi Martin,

please provide the following:

zoneadm list -cv
ps -fz  jcp-mail-zn-mn-colo1
df -hZ |grep   

Konstantin

___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] zoneadm confusion

2008-07-09 Thread Martin Englund
I get over 10,000 lines of output from "::threadlist -v" any hints on how to 
find the needle in the haystack? :)

cheers,
/Martin
 
 
This message posted from opensolaris.org
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] zoneadm confusion

2008-07-09 Thread Martin Englund

On 9 jul 2008, at 09:32, James Carlson wrote:

> Is there any chance that it's stuck trying to shut down?  I'd first
> look for threads that appear to be stuck in mdb's "::threadlist -v"
> output.
>
Argh! I missed one word ("already") in the message when I tried to  
boot the zone after killing the two zoneadmd processes:

[EMAIL PROTECTED] # zoneadm -z jcp-mail-zn-mn-colo1 boot
zone 'jcp-mail-zn-mn-colo1': WARNING: zone is in state 'down', but  
zoneadmd does not appear to be available; restarted zoneadmd to recover.
zoneadm: zone 'jcp-mail-zn-mn-colo1': zone is already booted

So I'm back to square one :(

I'll dig through the ::threadlist again...

cheers,
/Martin
-- 
Martin Englund, Security Engineer, .Sun Engineering, Sun Microsystems  
Inc.
Email: [EMAIL PROTECTED] Time Zone: GMT-3 PGP: 1024D/AA514677
"The question is not if you are paranoid, it is if you are paranoid  
enough."


___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] zoneadm confusion

2008-07-09 Thread Martin Englund
James,

On 9 jul 2008, at 09:32, James Carlson wrote:

> Martin Englund writes:
>> But at the same time, zoneadm boot thinks it is up:
>> [EMAIL PROTECTED] # zoneadm -z jcp-mail-zn-mn-colo1 boot
>> zoneadm: zone 'jcp-mail-zn-mn-colo1': zone is already booted
>>
>> If I try to halt the zone, it just hangs:
>> [EMAIL PROTECTED] # zoneadm -z jcp-mail-zn-mn-colo1 halt
>> ^C (after 10 minutes)
>>
>> Any clues to what is going on?
>
> Is there any chance that it's stuck trying to shut down?  I'd first
> look for threads that appear to be stuck in mdb's "::threadlist -v"
> output.
>

it turned out to be two zoneadmd processes for that zone, and after  
removing them I could get things back to normal with "zoneadm -z zone  
boot".

Thanks for the pointer!

cheers,
/Martin
-- 
Martin Englund, Security Engineer, .Sun Engineering, Sun Microsystems  
Inc.
Email: [EMAIL PROTECTED] Time Zone: GMT-3 PGP: 1024D/AA514677
"The question is not if you are paranoid, it is if you are paranoid  
enough."


___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] zoneadm confusion

2008-07-09 Thread James Carlson
Martin Englund writes:
> But at the same time, zoneadm boot thinks it is up:
> [EMAIL PROTECTED] # zoneadm -z jcp-mail-zn-mn-colo1 boot
> zoneadm: zone 'jcp-mail-zn-mn-colo1': zone is already booted
> 
> If I try to halt the zone, it just hangs:
> [EMAIL PROTECTED] # zoneadm -z jcp-mail-zn-mn-colo1 halt
> ^C (after 10 minutes)
> 
> Any clues to what is going on?

Is there any chance that it's stuck trying to shut down?  I'd first
look for threads that appear to be stuck in mdb's "::threadlist -v"
output.

-- 
James Carlson, Solaris Networking  <[EMAIL PROTECTED]>
Sun Microsystems / 35 Network Drive71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] zoneadm confusion

2008-07-09 Thread Martin Englund
Eh, it is not Solaris 10, it should say build 85 of Nevada :)

cheers,
/Martin
 
 
This message posted from opensolaris.org
___
zones-discuss mailing list
zones-discuss@opensolaris.org