Re: [zones-discuss] zoneadm confusion
Thank you Dan! Also for the good hint to look at zone.c. I tried to add some of the information into my picture. I moved it to http://gremliza.net for anyone who likes it. Regards, Konstantin This was my good old zone_state.d dtrace script: #!/usr/sbin/dtrace -qs BEGIN { z[0]="UNINITIALIZED"; z[1]="READY"; z[2]="BOOTING"; z[3]="RUNNING"; z[4]="SHUTTDING_DOWN"; z[5]="EMPTY"; z[6]="DOWN"; z[7]="DYING"; z[8]="DEAD"; st=timestamp; printf("%14s %12s %3s %s\n", "TIME", "ZONE", "ID", "STATE"); } proc:::exec / execname == "zoneadmd" / { self->exec = execname; } proc:::exec-success / self->exec == "zoneadmd" / { t=timestamp-st; printf("%10d.%03d", t/100, (t/1000)%1000 ); printf(" EXEC: %s\n", curpsinfo->pr_psargs) ; self->exec = 0; } proc:::create / args[0]->pr_fname == "zsched" || args[0]->pr_fname == "init" / { t=timestamp-st; printf("%10d.%03d", t/100, (t/1000)%1000 ); printf(" FORK: %3d %s[%d]\n", args[0]->pr_zoneid, args[0]->pr_fname, args[0]->pr_pid) ; } proc:::exit / execname == "zsched" || execname == "init" / { t=timestamp-st; printf("%10d.%03d", t/100, (t/1000)%1000 ); printf(" EXIT: %3d %s[%d]\n", curpsinfo->pr_zoneid, execname, pid) ; } fbt:genunix:zone_status_set:entry { t=timestamp-st; printf("%10d.%03d", t/100, (t/1000)%1000 ); printf(" %3d %12s %s\n", args[0]->zone_id, stringof(args[0]->zone_name), z[args[1]]); } Dan Price schrieb: On Thu 10 Jul 2008 at 02:42AM, Konstantin Gremliza wrote: Ok. I wrote a dtrace script to monitor zone states. The resulting automata is in the attachment. In the state down, we should still have a zsched (ps), but we should not have any mounted filesystems, which is true (df). Nicely done! This is very well organized. There's more info on the different states in the big comment at the top of: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/zone.c -dp ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] zoneadm confusion
On Thu 10 Jul 2008 at 02:42AM, Konstantin Gremliza wrote: >Ok. > >I wrote a dtrace script to monitor zone states. The resulting automata is >in the attachment. >In the state down, we should still have a zsched (ps), but we should not >have any mounted filesystems, which is true (df). Nicely done! This is very well organized. There's more info on the different states in the big comment at the top of: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/zone.c -dp -- Daniel Price - Solaris Kernel Engineering - [EMAIL PROTECTED] - blogs.sun.com/dp ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] zoneadm confusion
On 9 jul 2008, at 21:42, Konstantin Gremliza wrote: > I wrote a dtrace script to monitor zone states. The resulting > automata is in the attachment. > In the state down, we should still have a zsched (ps), but we should > not have any mounted filesystems, which is true (df). > There is no zsched running in that zone and nothing is mounted either (apart from the zone root). > Its quite correct that you cannot say > zoneadm -z xxx boot > when the zone is still down. > Did you ever try > zoneadm -z xxx halt > ? If not, please do (this is my first tip). > I had tried that, and it hung. Now that I tried the halt + boot again, it worked. I guess James suggestion might have been true, i.e. a stuck kernel thread. This is not the first time it has happened, so when/if it happens again I'll know where to look. > I see you have the root of the zone on zfs. I guess you know, that > this is not supported. > Did you replicated the zones with zfs clone? > It is running on Nevada not Solaris 10, and as far as I know it is supported there :) Thanks for your help and the zone automat picture. cheers, /Martin -- Martin Englund, Security Engineer, .Sun Engineering, Sun Microsystems Inc. Email: [EMAIL PROTECTED] Time Zone: GMT-3 PGP: 1024D/AA514677 "The question is not if you are paranoid, it is if you are paranoid enough." ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] zoneadm confusion
Konstantin, here's the info you requested: [EMAIL PROTECTED] # pstack $(pgrep -f 'zoneadm.*jcp-mail-zn-mn-colo1') 21276: zoneadmd -z jcp-mail-zn-mn-colo1 - lwp# 1 / thread# 1 fef05405 pollsys (8046cf0, 4, 0, 0) feeb6592 poll (8046cf0, 4, ) + 52 0805a0ce do_console_io (8077e78, 4, 2) + 76 0805a36c serve_console (8077e78) + 78 080590ba main (3, 8047df4, 8047e04) + 512 08056b7a _start (3, 8047eb0, 8047eb9, 8047ebc, 0, 8047ed1) + 7a - lwp# 2 / thread# 2 fef06266 door (0, 0, 0, 0, 0, 8) feeef0a3 door_unref_func (531c) + 43 fef01112 _thr_setup (fec60a00) + 52 fef01370 _lwp_start (fec60a00, 0, 0, 0, 0, 0) - lwp# 3 / thread# 3 fef062af door (fe74e850, 1000, 0, fe74fe00, f5f00, a) 0805876b server (0, fe74f8f0, 510, 0, 0, 8057edc) + 88f fef062e0 __door_return () + 60 - lwp# 4 / thread# 4 fef062af door (0, 0, 0, fe650e00, f5f00, a) feeef5cd door_create_func (0) + 2c fef01112 _thr_setup (fec61a00) + 52 fef01370 _lwp_start (fec61a00, 0, 0, 0, 0, 0) --- [EMAIL PROTECTED] # zoneadm list -cv ID NAME STATUS PATH BRANDIP 0 global running/ native shared 7 javafx-zn-mn-colo1 running/zones/javafx-zn-mn-colo1 native shared 8 egc-zn-mn-colo1 running/zones/egc-zn-mn-colo1 native shared 60 jcp-mail-zn-mn-colo1 down /zones/jcp-mail-zn-mn-colo1 native shared 63 jcp-web-zn-mn-colo1 running/zones/jcp-web-zn-mn-colo1 native shared - master installed /zones/master native shared --- [EMAIL PROTECTED] # ps -fz jcp-mail-zn-mn-colo1 UID PID PPID CSTIME TTY TIME CMD --- [EMAIL PROTECTED] # df -hZ |grep /zones/jcp-mail-zn-mn-colo1 z1/zones/jcp-mail-zn-mn-colo1 248G 619M 229G 1%/zones/ jcp-mail-zn-mn-colo1 --- [EMAIL PROTECTED] # zoneadm -z jcp-mail-zn-mn-colo1 boot zoneadm: zone 'jcp-mail-zn-mn-colo1': zone is already booted cheers, /Martin -- Martin Englund, Security Engineer, .Sun Engineering, Sun Microsystems Inc. Email: [EMAIL PROTECTED] Time Zone: GMT-3 PGP: 1024D/AA514677 "The question is not if you are paranoid, it is if you are paranoid enough." ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] zoneadm confusion
Martin Englund writes: > I get over 10,000 lines of output from "::threadlist -v" any hints > on how to find the needle in the haystack? :) I usually do "::threadlist -v ! less" and then search around in the output. The first suspects are zoneadmd and any longish-looking stacks. (There are more systematic ways to search for the offender, including locating the zone_t and finding out what it's blocked on, but looking at the stacks is often effective and quick.) -- James Carlson, Solaris Networking <[EMAIL PROTECTED]> Sun Microsystems / 35 Network Drive71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677 ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] zoneadm confusion
Martin Englund schrieb: > I get over 10,000 lines of output from "::threadlist -v" any hints on how to > find the needle in the haystack? :) > > cheers, > /Martin > > > This message posted from opensolaris.org > ___ > zones-discuss mailing list > zones-discuss@opensolaris.org > > __ Information from ESET NOD32 Antivirus, version of virus signature > database 3255 (20080709) __ > > The message was checked by ESET NOD32 Antivirus. > > http://www.eset.com > > > > Hi Martin, please provide the following: zoneadm list -cv ps -fz jcp-mail-zn-mn-colo1 df -hZ |grep Konstantin ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] zoneadm confusion
I get over 10,000 lines of output from "::threadlist -v" any hints on how to find the needle in the haystack? :) cheers, /Martin This message posted from opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] zoneadm confusion
On 9 jul 2008, at 09:32, James Carlson wrote: > Is there any chance that it's stuck trying to shut down? I'd first > look for threads that appear to be stuck in mdb's "::threadlist -v" > output. > Argh! I missed one word ("already") in the message when I tried to boot the zone after killing the two zoneadmd processes: [EMAIL PROTECTED] # zoneadm -z jcp-mail-zn-mn-colo1 boot zone 'jcp-mail-zn-mn-colo1': WARNING: zone is in state 'down', but zoneadmd does not appear to be available; restarted zoneadmd to recover. zoneadm: zone 'jcp-mail-zn-mn-colo1': zone is already booted So I'm back to square one :( I'll dig through the ::threadlist again... cheers, /Martin -- Martin Englund, Security Engineer, .Sun Engineering, Sun Microsystems Inc. Email: [EMAIL PROTECTED] Time Zone: GMT-3 PGP: 1024D/AA514677 "The question is not if you are paranoid, it is if you are paranoid enough." ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] zoneadm confusion
James, On 9 jul 2008, at 09:32, James Carlson wrote: > Martin Englund writes: >> But at the same time, zoneadm boot thinks it is up: >> [EMAIL PROTECTED] # zoneadm -z jcp-mail-zn-mn-colo1 boot >> zoneadm: zone 'jcp-mail-zn-mn-colo1': zone is already booted >> >> If I try to halt the zone, it just hangs: >> [EMAIL PROTECTED] # zoneadm -z jcp-mail-zn-mn-colo1 halt >> ^C (after 10 minutes) >> >> Any clues to what is going on? > > Is there any chance that it's stuck trying to shut down? I'd first > look for threads that appear to be stuck in mdb's "::threadlist -v" > output. > it turned out to be two zoneadmd processes for that zone, and after removing them I could get things back to normal with "zoneadm -z zone boot". Thanks for the pointer! cheers, /Martin -- Martin Englund, Security Engineer, .Sun Engineering, Sun Microsystems Inc. Email: [EMAIL PROTECTED] Time Zone: GMT-3 PGP: 1024D/AA514677 "The question is not if you are paranoid, it is if you are paranoid enough." ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] zoneadm confusion
Martin Englund writes: > But at the same time, zoneadm boot thinks it is up: > [EMAIL PROTECTED] # zoneadm -z jcp-mail-zn-mn-colo1 boot > zoneadm: zone 'jcp-mail-zn-mn-colo1': zone is already booted > > If I try to halt the zone, it just hangs: > [EMAIL PROTECTED] # zoneadm -z jcp-mail-zn-mn-colo1 halt > ^C (after 10 minutes) > > Any clues to what is going on? Is there any chance that it's stuck trying to shut down? I'd first look for threads that appear to be stuck in mdb's "::threadlist -v" output. -- James Carlson, Solaris Networking <[EMAIL PROTECTED]> Sun Microsystems / 35 Network Drive71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677 ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] zoneadm confusion
Eh, it is not Solaris 10, it should say build 85 of Nevada :) cheers, /Martin This message posted from opensolaris.org ___ zones-discuss mailing list zones-discuss@opensolaris.org