Re: [zones-discuss] Zone Stuck in a shutting_down state in os2008.11

2009-06-21 Thread solarg

thanks for all replies.
Unfortunately, i was forced to reboot the machine without understanding 
what was happening. And now, it's ok


gerard
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Zone Stuck in a shutting_down state in os2008.11

2009-06-21 Thread Derek McEachern
I guess my wording was a little confusing. What I meant by "nfs mounts in
the ngz from the gz" is that since you can't log into the zone you need to
check for hung nfs mounts from the global zone. The nfs mounts that we have
had hung were from filers.

Derek

On Sun, Jun 21, 2009 at 3:05 PM, Craig Cory  wrote:

> For this reason and others, it is recommended to NOT mount non-global zone
> clients to their own global zone servers. Use lofs for these local mounts.
>
>
> Derek McEachern wrote:
> > I have had the same problem with two zones and using the following two
> steps
> > I was able to get one of the zones to shut down and the other one I
> wasn't.
> >
> > First, check for hung nfs mounts for the ngz from the gz.  mount | grep
> > www2.  If you see any umount them from the gz.
> >
> > Next, check for any processes that might be accessing files in the ngz.
> From
> > the gz you can do something like:
> >
> > #  pwdx /proc/* | grep zone1
> > 17459:  /export/zone/zone1/root
> > 17731:  /export/zone/zone1/root/tmp
> > 18022:  /export/zone/zone1/root/tmp
> >
> > # ps -ef | egrep "18022|17731"
> > root 18022 17731   0 13:15:33 pts/2   0:00 sleep 50
> > root 18064 17745   0 13:17:40 pts/1   0:00 egrep 18022|17731
> > root 17731 17727   0 13:14:18 pts/2   0:00 sh
> >
> >
> > If you find any processes you can try and kill them.
> >
> >
> > After each step try and halt the zone and see if it comes down.
> >
> > If neither of these work the only solution I've heard of is rebooting the
> > host.
> >
> > On Sun, Jun 21, 2009 at 6:25 AM, solarg  wrote:
> >
> >> hello all,
> >> after trying to reboot a zone, it still hang:
> >> he...@antigone:~# zoneadm -z www2 reboot;zlogin -C www2
> >>
> >> on other termial, i try to kill the process:
> >> he...@antigone:~# ps -ef|grep www2
> >>root 16432 1   0   Mar 18 ?   0:03 zoneadmd -z www2
> >>root 18864 18860   0 12:18:14 pts/5   0:00 grep www2
> >>root 18809 11676   0 12:09:53 pts/3   0:00 zoneadm -z www2 reboot
> >> he...@antigone:~# kill -9 16432
> >>
> >> and:
> >> he...@antigone:~# zoneadm -z www2 reboot;zlogin -C www2
> >> door_call failed: Interrupted system call
> >> zone 'www2': WARNING: zone is in state 'down', but zoneadmd does not
> appear
> >> to be available; restarted zoneadmd to recover.
> >> [Connected to zone 'www2' console]
> >> ~^D
> >>
> >> he...@global:~# zoneadm list -cv
> >>  ID NAME STATUS PATH   BRAND
> >>  73 www2 down   /zones/www2ipkg
> shared
> >>
> >> i also have:
> >> he...@antigone:~# mdb -k
> >> > ::walk zone | ::print zone_t zone_name zone_ref
> >> ...
> >> zone_name = 0xff044b049c80 "www2"
> >> zone_ref = 0x2
> >>
> >> a precedent thread said:
> >> "If the refcount is greater than 0x1, it could be:
> >>6272846 User orders zone death; NFS client thumbs nose
> >> "
> >>
> >> he...@antigone:~# ps -ef|grep www2
> >>root 19091 18860   0 13:19:32 pts/5   0:00 zoneadm -z www2 halt
> >>root 19093 1   0 13:19:32 ?   0:00 zoneadmd -z www2
> >>root 19113 11676   0 13:24:30 pts/3   0:00 grep www2
> >> he...@antigone:~# truss -p 19093
> >> /4: door_return(0x, 0, 0x, 0xFE4F0E00, 1007360)
> >> (sleeping...)
> >> /3: zone_destroy(73)(sleeping...)
> >> /1: pollsys(0x08046AD0, 4, 0x, 0x) (sleeping...)
> >> /2: door_unref()(sleeping...)
> >> he...@antigone:~# truss -p 19091
> >> door_call(6, 0x08047590)(sleeping...)
> >>
> >>
> >> Any idea?
> >>
> >> thanks for help,
> >>
> >> gerard
> >> ___
> >> zones-discuss mailing list
> >> zones-discuss@opensolaris.org
> >>
> > ___
> > zones-discuss mailing list
> > zones-discuss@opensolaris.org
>
>
> --
> Craig Cory
>  Senior Instructor :: ExitCertified
>  : Sun Certified System Administrator
>  : Sun Certified Network Administrator
>  : Sun Certified Security Administrator
>  : Veritas Certified Instructor
>
>  8950 Cal Center Drive
>  Bldg 1, Suite 110
>  Sacramento, California  95826
>  [e] craig.c...@exitcertified.com
>  [p] 916.669.3970
>  [f] 916.669.3977
>  [w] WWW.EXITCERTIFIED.COM
> +-+
>   OTTAWA | SACRAMENTO | MONTREAL | LAS VEGAS | QUEBEC CITY | CALGARY
>SAN FRANCISCO | VANCOUVER | REGINA | WINNIPEG | TORONTO
>
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Zone Stuck in a shutting_down state in os2008.11

2009-06-21 Thread Derek McEachern
I have had the same problem with two zones and using the following two steps
I was able to get one of the zones to shut down and the other one I wasn't.

First, check for hung nfs mounts for the ngz from the gz.  mount | grep
www2.  If you see any umount them from the gz.

Next, check for any processes that might be accessing files in the ngz. From
the gz you can do something like:

#  pwdx /proc/* | grep zone1
17459:  /export/zone/zone1/root
17731:  /export/zone/zone1/root/tmp
18022:  /export/zone/zone1/root/tmp

# ps -ef | egrep "18022|17731"
root 18022 17731   0 13:15:33 pts/2   0:00 sleep 50
root 18064 17745   0 13:17:40 pts/1   0:00 egrep 18022|17731
root 17731 17727   0 13:14:18 pts/2   0:00 sh


If you find any processes you can try and kill them.


After each step try and halt the zone and see if it comes down.

If neither of these work the only solution I've heard of is rebooting the
host.

On Sun, Jun 21, 2009 at 6:25 AM, solarg  wrote:

> hello all,
> after trying to reboot a zone, it still hang:
> he...@antigone:~# zoneadm -z www2 reboot;zlogin -C www2
>
> on other termial, i try to kill the process:
> he...@antigone:~# ps -ef|grep www2
>root 16432 1   0   Mar 18 ?   0:03 zoneadmd -z www2
>root 18864 18860   0 12:18:14 pts/5   0:00 grep www2
>root 18809 11676   0 12:09:53 pts/3   0:00 zoneadm -z www2 reboot
> he...@antigone:~# kill -9 16432
>
> and:
> he...@antigone:~# zoneadm -z www2 reboot;zlogin -C www2
> door_call failed: Interrupted system call
> zone 'www2': WARNING: zone is in state 'down', but zoneadmd does not appear
> to be available; restarted zoneadmd to recover.
> [Connected to zone 'www2' console]
> ~^D
>
> he...@global:~# zoneadm list -cv
>  ID NAME STATUS PATH   BRAND
>  73 www2 down   /zones/www2ipkg shared
>
> i also have:
> he...@antigone:~# mdb -k
> > ::walk zone | ::print zone_t zone_name zone_ref
> ...
> zone_name = 0xff044b049c80 "www2"
> zone_ref = 0x2
>
> a precedent thread said:
> "If the refcount is greater than 0x1, it could be:
>6272846 User orders zone death; NFS client thumbs nose
> "
>
> he...@antigone:~# ps -ef|grep www2
>root 19091 18860   0 13:19:32 pts/5   0:00 zoneadm -z www2 halt
>root 19093 1   0 13:19:32 ?   0:00 zoneadmd -z www2
>root 19113 11676   0 13:24:30 pts/3   0:00 grep www2
> he...@antigone:~# truss -p 19093
> /4: door_return(0x, 0, 0x, 0xFE4F0E00, 1007360)
> (sleeping...)
> /3: zone_destroy(73)(sleeping...)
> /1: pollsys(0x08046AD0, 4, 0x, 0x) (sleeping...)
> /2: door_unref()(sleeping...)
> he...@antigone:~# truss -p 19091
> door_call(6, 0x08047590)(sleeping...)
>
>
> Any idea?
>
> thanks for help,
>
> gerard
> ___
> zones-discuss mailing list
> zones-discuss@opensolaris.org
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Zone Stuck in a shutting_down state

2009-05-07 Thread Derek McEachern
Steve,

Thanks for this information.

I ran through the commands and this is what I see.

> ::kmem_cache ! grep rnode
a6438008 rnode_cache   0200 00  65670506
a643c008 rnode4_cache  0200 00  9680

When I run the following:
a6438008::walk kmem | ::print rnode_t r_vnode | ::wnode2path

I can see two listed fs's
/zone/zonetest-new/root/opt/xxx//logs.tar
/zone/zonetest-new/root/var/xxx/

But when I run the ::fsinfo command there are no nfs mounted filesystems in
the zonetest-new fs. Both of the fs's listed were nfs mounted when the zone
was up and running.

It really looks like someone was doing something with the logs.tar file at
the time the zone was coming down which probably started all my problems.

I really appreciate all this info.

Thanks,
Derek


On Thu, May 7, 2009 at 12:34 AM, Steve Lawrence wrote:

> Related comments from bug below (X'ed out some paths):
>
> The zone in question clearly has too many references
>
> > 030004a09680::print zone_t zone_ref
> zone_ref = 0t11
>
> Ten too many, to be precise.  So what's holding onto the zone?  Well the
> rnode
> cache has 5 entries
>
> > ::kmem_cache ! grep rnode
> 030003a1e988 rnode_cache    00  640   572988
> 030003a20988 rnode4_cache   00  9840
> > 030003a1e988::walk kmem | ::print rnode_t r_vnode | ::vnode2path
> /opt/zones/z1/root/
> /opt/zones/z1/root/
> /opt/zones/z1/root/
> /opt/zones/z1/root/
> /opt/zones/z1/root/
>
> even though no nfs filesystems are mounted
>
> > ::fsinfo
>VFSP FS  MOUNT
> 0187f420 ufs /
> 0187f508 devfs   /devices
> 03315780 ctfs/system/contract
> 033156c0 proc/proc
> 03315600 mntfs   /etc/mnttab
> 03315480 tmpfs   /etc/svc/volatile
> 033153c0 objfs   /system/object
> 0300039987c0 namefs  /etc/svc/volatile/repository_door
> 0300039984c0 fd  /dev/fd
> 030003a99e00 ufs /var
> 030003998400 tmpfs   /tmp
> 030003a99680 tmpfs   /var/run
> 030003a98f00 namefs  /var/run/name_service_door
> 030003a98b40 namefs
>  /var/run/sysevent_channels/syseventd_channel...
> 030003a989c0 namefs  /etc/sysevent/sysevent_door
> 030003a98780 namefs  /etc/sysevent/devfsadm_event_channel/1
> 030003a98540 namefs  /dev/.zone_reg_door
> 030003a983c0 namefs  /dev/.devfsadm_synch_door
> 030003a99380 namefs  /etc/sysevent/piclevent_door
> 0300044b1d80 namefs  /var/run/picld_door
> 030003a99200 ufs /opt
> 0300044b0700 namefs  /var/run/zones/z1.zoneadmd_door
>
> And as apparent from the path, all of those rnodes refer to zone z1 through
> their mntinfo structure
>
> > 030003a1e988::walk kmem | ::print rnode_t r_vnode->v_vfsp->vfs_data |
> ::print mntinfo_t mi_zone | ::zone
>ADDR ID NAME PATH
> 030004a09680  1 z1   /opt/zones/z1/root/
> 030004a09680  1 z1   /opt/zones/z1/root/
> 030004a09680  1 z1   /opt/zones/z1/root/
> 030004a09680  1 z1   /opt/zones/z1/root/
> 030004a09680  1 z1   /opt/zones/z1/root/
>
> So if each of those rnodes has two holds on the zone, then that accounts
> for
> all of the extra holds exactly.
>
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Zone Stuck in a shutting_down state

2009-05-06 Thread Steve Lawrence
Related comments from bug below (X'ed out some paths):

The zone in question clearly has too many references

> 030004a09680::print zone_t zone_ref  
zone_ref = 0t11

Ten too many, to be precise.  So what's holding onto the zone?  Well the rnode
cache has 5 entries

> ::kmem_cache ! grep rnode
030003a1e988 rnode_cache    00  640   572988
030003a20988 rnode4_cache   00  9840
> 030003a1e988::walk kmem | ::print rnode_t r_vnode | ::vnode2path
/opt/zones/z1/root/
/opt/zones/z1/root/
/opt/zones/z1/root/
/opt/zones/z1/root/
/opt/zones/z1/root/

even though no nfs filesystems are mounted

> ::fsinfo
VFSP FS  MOUNT
0187f420 ufs /
0187f508 devfs   /devices
03315780 ctfs/system/contract
033156c0 proc/proc
03315600 mntfs   /etc/mnttab
03315480 tmpfs   /etc/svc/volatile
033153c0 objfs   /system/object
0300039987c0 namefs  /etc/svc/volatile/repository_door
0300039984c0 fd  /dev/fd
030003a99e00 ufs /var
030003998400 tmpfs   /tmp
030003a99680 tmpfs   /var/run
030003a98f00 namefs  /var/run/name_service_door
030003a98b40 namefs  /var/run/sysevent_channels/syseventd_channel...
030003a989c0 namefs  /etc/sysevent/sysevent_door
030003a98780 namefs  /etc/sysevent/devfsadm_event_channel/1
030003a98540 namefs  /dev/.zone_reg_door
030003a983c0 namefs  /dev/.devfsadm_synch_door
030003a99380 namefs  /etc/sysevent/piclevent_door
0300044b1d80 namefs  /var/run/picld_door
030003a99200 ufs /opt
0300044b0700 namefs  /var/run/zones/z1.zoneadmd_door

And as apparent from the path, all of those rnodes refer to zone z1 through
their mntinfo structure

> 030003a1e988::walk kmem | ::print rnode_t r_vnode->v_vfsp->vfs_data | 
> ::print mntinfo_t mi_zone | ::zone
ADDR ID NAME PATH
030004a09680  1 z1   /opt/zones/z1/root/
030004a09680  1 z1   /opt/zones/z1/root/
030004a09680  1 z1   /opt/zones/z1/root/
030004a09680  1 z1   /opt/zones/z1/root/
030004a09680  1 z1   /opt/zones/z1/root/

So if each of those rnodes has two holds on the zone, then that accounts for
all of the extra holds exactly.


On Wed, May 06, 2009 at 09:04:54PM -0500, Derek McEachern wrote:
>I don't believe that I can see the comments since they are not public.
> 
>Is that something you can pass along?
> 
>On Wed, May 6, 2009 at 5:27 PM, Steve Lawrence
><[1]stephen.lawre...@sun.com> wrote:
> 
>  > * *I already tried killing the zoneadmd process and issuing the halt
>  and all
>  > * *it does is start back up the zoneadmd process and hang.* I can't
>  force a
>  > * *crashdump on the system since I can't take the box down.
>  >
>  > * *Bug 6272846 makes reference to nfs version 3, (which is the version
>  we are
>  > * *using), and the client apparently leaking rnodes. Is there any way
>  to
>  > * *verify this other then a forced crashdump? I might take a live core
>  of the
>  > * *system and open a case to see if that yields anything.
> 
>  The zone_ref > 1 means that something in the kernel is holding the zone.
>  You should be able to use "mdb -k" on the live system, and issue dcmds
>  similar
>  to the comments of 6272846. *No need to force a crashdump or take a live
>  crashdump.
> 
>  -Steve L.
>  >
>  > * *Derek
>  >
>  > * *On Wed, May 6, 2009 at 4:08 PM, Steve Lawrence
>  > * *<[1][2]stephen.lawre...@sun.com> wrote:
>  >
>  > * * *zsched is always unkillable. *It will only exit when instructed
>  to by
>  > * * *zoneadmd.
>  >
>  > * * *Is the remaining zone "shutting down", or "down"? *(zoneadm list
>  -v).
>  >
>  > * * *What is the ref_count on the zone?
>  >
>  > * * *# mdb -k
>  > * * *> ::walk zone | ::print zone_t zone_name zone_ref
>  >
>  > * * *If the refcount is greater than 0x1, it could be:
>  > * * ** * * *6272846 User orders zone death; NFS client thumbs nose
>  >
>  > * * *No workaround for this one. *A crashdump would help investigate a
>  > * * *zone_ref
>  > * * *greater than 1.
>  >
>  > * * *Is there a zoneadmd process for the given zone?
>  > * * *# pgrep -lf zoneadmd
>  >
>  > * * *If so, please provide *truss -p " of this process. *You may
>  also
>  > * * *attempt
>  > * * *killing this zoneadmd process (which lives in the global zone),
>  and then
>  > * * *re-attempting "zoneadm -z  halt".
>  >
>  > * * *Thanks,
>  >
>  > * * *-S

Re: [zones-discuss] Zone Stuck in a shutting_down state

2009-05-06 Thread Derek McEachern
I don't believe that I can see the comments since they are not public.

Is that something you can pass along?

On Wed, May 6, 2009 at 5:27 PM, Steve Lawrence wrote:

> >I already tried killing the zoneadmd process and issuing the halt and
> all
> >it does is start back up the zoneadmd process and hang.* I can't force
> a
> >crashdump on the system since I can't take the box down.
> >
> >Bug 6272846 makes reference to nfs version 3, (which is the version we
> are
> >using), and the client apparently leaking rnodes. Is there any way to
> >verify this other then a forced crashdump? I might take a live core of
> the
> >system and open a case to see if that yields anything.
>
> The zone_ref > 1 means that something in the kernel is holding the zone.
> You should be able to use "mdb -k" on the live system, and issue dcmds
> similar
> to the comments of 6272846.  No need to force a crashdump or take a live
> crashdump.
>
> -Steve L.
>
> >
> >Derek
> >
> >On Wed, May 6, 2009 at 4:08 PM, Steve Lawrence
> ><[1]stephen.lawre...@sun.com> wrote:
> >
> >  zsched is always unkillable. *It will only exit when instructed to
> by
> >  zoneadmd.
> >
> >  Is the remaining zone "shutting down", or "down"? *(zoneadm list
> -v).
> >
> >  What is the ref_count on the zone?
> >
> >  # mdb -k
> >  > ::walk zone | ::print zone_t zone_name zone_ref
> >
> >  If the refcount is greater than 0x1, it could be:
> >  * * * *6272846 User orders zone death; NFS client thumbs nose
> >
> >  No workaround for this one. *A crashdump would help investigate a
> >  zone_ref
> >  greater than 1.
> >
> >  Is there a zoneadmd process for the given zone?
> >  # pgrep -lf zoneadmd
> >
> >  If so, please provide *truss -p " of this process. *You may
> also
> >  attempt
> >  killing this zoneadmd process (which lives in the global zone), and
> then
> >  re-attempting "zoneadm -z  halt".
> >
> >  Thanks,
> >
> >  -Steve L.
> >
> > References
> >
> >Visible links
> >1. mailto:stephen.lawre...@sun.com
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Zone Stuck in a shutting_down state

2009-05-06 Thread Steve Lawrence
>I already tried killing the zoneadmd process and issuing the halt and all
>it does is start back up the zoneadmd process and hang.* I can't force a
>crashdump on the system since I can't take the box down.
> 
>Bug 6272846 makes reference to nfs version 3, (which is the version we are
>using), and the client apparently leaking rnodes. Is there any way to
>verify this other then a forced crashdump? I might take a live core of the
>system and open a case to see if that yields anything.

The zone_ref > 1 means that something in the kernel is holding the zone.
You should be able to use "mdb -k" on the live system, and issue dcmds similar
to the comments of 6272846.  No need to force a crashdump or take a live
crashdump.

-Steve L.

> 
>Derek
> 
>On Wed, May 6, 2009 at 4:08 PM, Steve Lawrence
><[1]stephen.lawre...@sun.com> wrote:
> 
>  zsched is always unkillable. *It will only exit when instructed to by
>  zoneadmd.
> 
>  Is the remaining zone "shutting down", or "down"? *(zoneadm list -v).
> 
>  What is the ref_count on the zone?
> 
>  # mdb -k
>  > ::walk zone | ::print zone_t zone_name zone_ref
> 
>  If the refcount is greater than 0x1, it could be:
>  * * * *6272846 User orders zone death; NFS client thumbs nose
> 
>  No workaround for this one. *A crashdump would help investigate a
>  zone_ref
>  greater than 1.
> 
>  Is there a zoneadmd process for the given zone?
>  # pgrep -lf zoneadmd
> 
>  If so, please provide *truss -p " of this process. *You may also
>  attempt
>  killing this zoneadmd process (which lives in the global zone), and then
>  re-attempting "zoneadm -z  halt".
> 
>  Thanks,
> 
>  -Steve L.
> 
> References
> 
>Visible links
>1. mailto:stephen.lawre...@sun.com
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Zone Stuck in a shutting_down state

2009-05-06 Thread Derek McEachern
The zone is in a shutting_down state.

The mdb command for this zone returns 0x1a, greater then 1.

zone_name = 0xfe86c83d61c0 "zonetest-new"
zone_ref = 0x1a

This is new to me, what is the refcount counting? What should this value be
for the zone to shutdown?

There is a zoneadmd processes running for the zone.  Trussing it and issuing
the halt command I can see it look up the zone, get some attributes and then
send it the shutdown which is where it hangs.

5364:   psargs: zoneadmd -z zonetest-new
5364/2: door_return(0xFE6CD870, 4096, 0x, 0) (sleeping...)
5364/4: door_return(0x, 0, 0x, 0) (sleeping...)
5364/3: door_unref()(sleeping...)
5364/1: pollsys(0x08046C50, 4, 0x, 0x) (sleeping...)
5364/2: 32.4128 door_return(0xFE6CD870, 4096, 0x, 0)= 0
5364/2: 32.4642 door_ucred(0x08077150)  = 0
5364/2: 32.4769 zone_lookup("zonetest-new")  =
64
5364/2: 32.4770 zone_getattr(64, ZONE_ATTR_STATUS, 0xFE6CD85C, 4) =
4
5364/2: 32.4843 zone_lookup("zonetest-new")  =
64
5364/2: zone_shutdown(64)   (sleeping...)

I already tried killing the zoneadmd process and issuing the halt and all it
does is start back up the zoneadmd process and hang.  I can't force a
crashdump on the system since I can't take the box down.

Bug 6272846 makes reference to nfs version 3, (which is the version we are
using), and the client apparently leaking rnodes. Is there any way to verify
this other then a forced crashdump? I might take a live core of the system
and open a case to see if that yields anything.

Derek


On Wed, May 6, 2009 at 4:08 PM, Steve Lawrence wrote:

> zsched is always unkillable.  It will only exit when instructed to by
> zoneadmd.
>
> Is the remaining zone "shutting down", or "down"?  (zoneadm list -v).
>
> What is the ref_count on the zone?
>
> # mdb -k
> > ::walk zone | ::print zone_t zone_name zone_ref
>
> If the refcount is greater than 0x1, it could be:
>6272846 User orders zone death; NFS client thumbs nose
>
> No workaround for this one.  A crashdump would help investigate a zone_ref
> greater than 1.
>
> Is there a zoneadmd process for the given zone?
> # pgrep -lf zoneadmd
>
> If so, please provide  truss -p " of this process.  You may also
> attempt
> killing this zoneadmd process (which lives in the global zone), and then
> re-attempting "zoneadm -z  halt".
>
> Thanks,
>
> -Steve L.
>
>
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Zone Stuck in a shutting_down state

2009-05-06 Thread Steve Lawrence
zsched is always unkillable.  It will only exit when instructed to by
zoneadmd.

Is the remaining zone "shutting down", or "down"?  (zoneadm list -v).

What is the ref_count on the zone?

# mdb -k
> ::walk zone | ::print zone_t zone_name zone_ref

If the refcount is greater than 0x1, it could be:
6272846 User orders zone death; NFS client thumbs nose

No workaround for this one.  A crashdump would help investigate a zone_ref
greater than 1.

Is there a zoneadmd process for the given zone?
# pgrep -lf zoneadmd

If so, please provide  truss -p " of this process.  You may also attempt
killing this zoneadmd process (which lives in the global zone), and then
re-attempting "zoneadm -z  halt".

Thanks,

-Steve L.


On Tue, May 05, 2009 at 10:18:07AM -0500, Derek McEachern wrote:
>Just follow up on the progress and resolution to my stuck zones problem.*
>I had two zones stuck in the shutting_down state.
> 
>Based on initial feedback I looked for nfs mounts in /etc/mnttab in the gz
>that were mounted in the ngz. There were a couple and umount'ed them. Then
>I was able to find two processes that indicated they were accessing the
>ngz filesystem, zsched and svc.configd. I tried trussing svc.configd but
>was unable to due to "unanticipated system error".* I killed svc.configd,
>ran the zoneadm halt and the zone successfully shut down.
> 
>On the second zone that's stuck I umount'ed the nfs file systems and
>checked for processes accessing the ngz filesystem and the only one
>reported is zsched. Trying to halt the zone doesn't do anything and from
>the looks of it zsched appears to be unkillable. It looks like this zone
>is here to stay until I can reboot the box.
> 
>Derek
> 
>On Tue, Apr 28, 2009 at 10:09 PM, Derek McEachern
><[1]derekmceach...@gmail.com> wrote:
> 
>  There were a bunch of nfs mounts listed in the /etc/mntab of the global
>  zone. I was able to umount them but zone is still hung up.
> 
>  *I tried killing the zoneadmd process and ran zoneadm halt again and it
>  started the zoneadmd back up but it didn't do anything.
> 
>  Thanks to everyone for their suggestions, looks like I'm going to have
>  to wait until I can take the box down for a reboot.
> 
>  Regards,
>  Derek
> 
>  On Tue, Apr 28, 2009 at 9:02 PM, Alexander J. Maidak
>  <[2]ajmai...@mchsi.com> wrote:
> 
>If its hung nfs mount you should be able to see it still mounted in
>the /etc/mntab file in the global zone: grep nfs /etc/mntab. It will
>be
>mounted under the zonepath. *You should then be able to do a umount
>-f / from the global zone and if you're really lucky
>the
>zone will finish shutting down.
> 
>-Alex
>On Tue, 2009-04-28 at 16:19 -0500, Derek McEachern wrote:
>> It's possible that it could be nfs mount related since the zone did
>> have nfs mounted fs's but they should have been umounted prior to
>> shutting down the zone. *In any event I can no longer get into the
>> zone to checkusing *zlogin and zlogin -C.
>>
>> I tried Bryan's suggestion on looking for processes that might have
>> open filehandles to files under the zone's filesystem tree but I
>don't
>> see that there are any.
>>
>> On Tue, Apr 28, 2009 at 3:40 PM, Bryan Allen
><[3]...@mirrorshades.net>
>> wrote:
>> * * * *
>
> +--
>> * * * * | On 2009-04-28 15:37:22, Derek McEachern wrote:
>> * * * * |
>> * * * * | We were trying to bring down a zone on a S10 U4 system and
>> * * * * it ended up stuck
>> * * * * | in the shutting_down state.
>> * * * * |
>> * * * * | ID NAME * * * * * * STATUS * * PATH
>> * * * * BRAND * *IP
>> * * * * | 74 zonetest-new * * shutting_down /zone/zonetest-new
>> * * * * native
>> * * * * | shared
>> * * * * |
>> * * * * |
>> * * * * | The only process I see running is the zoneadmd process
>> * * * * |
>> * * * * | dlet15:/home/derekm/ ps -efZ | grep zonetest-new
>> * * * * | * global * *root 12680 * * 1 * 0 * Apr 24 ? * * * * * 0:02
>> * * * * zoneadmd -z
>> * * * * | zonetest-new
>>
>>
>>
>> * * * * Do any processes (notably shells in the global zones) have
>an
>> * * * * open filehandle
>> * * * * somewhere under the zone's filesystem tree? This can (at
>least
>> * * * * on Sol10) cause
>> * * * * zones to not shut down, since it can't close the FH (I
>assume,
>> * * * * anyway).
>> * * * * --
>> * * * * bda
>> * * * * cyberpunk is dead. long live cyberpunk.
>> * * * * _

Re: [zones-discuss] Zone Stuck in a shutting_down state

2009-05-05 Thread Derek McEachern
Just follow up on the progress and resolution to my stuck zones problem.  I
had two zones stuck in the shutting_down state.

Based on initial feedback I looked for nfs mounts in /etc/mnttab in the gz
that were mounted in the ngz. There were a couple and umount'ed them. Then I
was able to find two processes that indicated they were accessing the ngz
filesystem, zsched and svc.configd. I tried trussing svc.configd but was
unable to due to "unanticipated system error".  I killed svc.configd, ran
the zoneadm halt and the zone successfully shut down.

On the second zone that's stuck I umount'ed the nfs file systems and checked
for processes accessing the ngz filesystem and the only one reported is
zsched. Trying to halt the zone doesn't do anything and from the looks of it
zsched appears to be unkillable. It looks like this zone is here to stay
until I can reboot the box.

Derek

On Tue, Apr 28, 2009 at 10:09 PM, Derek McEachern
wrote:

> There were a bunch of nfs mounts listed in the /etc/mntab of the global
> zone. I was able to umount them but zone is still hung up.
>
>  I tried killing the zoneadmd process and ran zoneadm halt again and it
> started the zoneadmd back up but it didn't do anything.
>
> Thanks to everyone for their suggestions, looks like I'm going to have to
> wait until I can take the box down for a reboot.
>
> Regards,
> Derek
>
>
> On Tue, Apr 28, 2009 at 9:02 PM, Alexander J. Maidak 
> wrote:
>
>> If its hung nfs mount you should be able to see it still mounted in
>> the /etc/mntab file in the global zone: grep nfs /etc/mntab. It will be
>> mounted under the zonepath.  You should then be able to do a umount
>> -f / from the global zone and if you're really lucky the
>> zone will finish shutting down.
>>
>> -Alex
>>
>> On Tue, 2009-04-28 at 16:19 -0500, Derek McEachern wrote:
>> > It's possible that it could be nfs mount related since the zone did
>> > have nfs mounted fs's but they should have been umounted prior to
>> > shutting down the zone.  In any event I can no longer get into the
>> > zone to checkusing  zlogin and zlogin -C.
>> >
>> > I tried Bryan's suggestion on looking for processes that might have
>> > open filehandles to files under the zone's filesystem tree but I don't
>> > see that there are any.
>> >
>> > On Tue, Apr 28, 2009 at 3:40 PM, Bryan Allen 
>> > wrote:
>> >
>> +--
>> > | On 2009-04-28 15:37:22, Derek McEachern wrote:
>> > |
>> > | We were trying to bring down a zone on a S10 U4 system and
>> > it ended up stuck
>> > | in the shutting_down state.
>> > |
>> > | ID NAME STATUS PATH
>> > BRANDIP
>> > | 74 zonetest-new shutting_down /zone/zonetest-new
>> > native
>> > | shared
>> > |
>> > |
>> > | The only process I see running is the zoneadmd process
>> > |
>> > | dlet15:/home/derekm/ ps -efZ | grep zonetest-new
>> > |   globalroot 12680 1   0   Apr 24 ?   0:02
>> > zoneadmd -z
>> > | zonetest-new
>> >
>> >
>> >
>> > Do any processes (notably shells in the global zones) have an
>> > open filehandle
>> > somewhere under the zone's filesystem tree? This can (at least
>> > on Sol10) cause
>> > zones to not shut down, since it can't close the FH (I assume,
>> > anyway).
>> > --
>> > bda
>> > cyberpunk is dead. long live cyberpunk.
>> > ___
>> > zones-discuss mailing list
>> > zones-discuss@opensolaris.org
>> >
>> > ___
>> > zones-discuss mailing list
>> > zones-discuss@opensolaris.org
>>
>>
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Zone Stuck in a shutting_down state

2009-04-28 Thread Derek McEachern
There were a bunch of nfs mounts listed in the /etc/mntab of the global
zone. I was able to umount them but zone is still hung up.

 I tried killing the zoneadmd process and ran zoneadm halt again and it
started the zoneadmd back up but it didn't do anything.

Thanks to everyone for their suggestions, looks like I'm going to have to
wait until I can take the box down for a reboot.

Regards,
Derek

On Tue, Apr 28, 2009 at 9:02 PM, Alexander J. Maidak wrote:

> If its hung nfs mount you should be able to see it still mounted in
> the /etc/mntab file in the global zone: grep nfs /etc/mntab. It will be
> mounted under the zonepath.  You should then be able to do a umount
> -f / from the global zone and if you're really lucky the
> zone will finish shutting down.
>
> -Alex
>
> On Tue, 2009-04-28 at 16:19 -0500, Derek McEachern wrote:
> > It's possible that it could be nfs mount related since the zone did
> > have nfs mounted fs's but they should have been umounted prior to
> > shutting down the zone.  In any event I can no longer get into the
> > zone to checkusing  zlogin and zlogin -C.
> >
> > I tried Bryan's suggestion on looking for processes that might have
> > open filehandles to files under the zone's filesystem tree but I don't
> > see that there are any.
> >
> > On Tue, Apr 28, 2009 at 3:40 PM, Bryan Allen 
> > wrote:
> >
> +--
> > | On 2009-04-28 15:37:22, Derek McEachern wrote:
> > |
> > | We were trying to bring down a zone on a S10 U4 system and
> > it ended up stuck
> > | in the shutting_down state.
> > |
> > | ID NAME STATUS PATH
> > BRANDIP
> > | 74 zonetest-new shutting_down /zone/zonetest-new
> > native
> > | shared
> > |
> > |
> > | The only process I see running is the zoneadmd process
> > |
> > | dlet15:/home/derekm/ ps -efZ | grep zonetest-new
> > |   globalroot 12680 1   0   Apr 24 ?   0:02
> > zoneadmd -z
> > | zonetest-new
> >
> >
> >
> > Do any processes (notably shells in the global zones) have an
> > open filehandle
> > somewhere under the zone's filesystem tree? This can (at least
> > on Sol10) cause
> > zones to not shut down, since it can't close the FH (I assume,
> > anyway).
> > --
> > bda
> > cyberpunk is dead. long live cyberpunk.
> > ___
> > zones-discuss mailing list
> > zones-discuss@opensolaris.org
> >
> > ___
> > zones-discuss mailing list
> > zones-discuss@opensolaris.org
>
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Zone Stuck in a shutting_down state

2009-04-28 Thread Alexander J. Maidak
If its hung nfs mount you should be able to see it still mounted in
the /etc/mntab file in the global zone: grep nfs /etc/mntab. It will be
mounted under the zonepath.  You should then be able to do a umount
-f / from the global zone and if you're really lucky the
zone will finish shutting down.

-Alex

On Tue, 2009-04-28 at 16:19 -0500, Derek McEachern wrote:
> It's possible that it could be nfs mount related since the zone did
> have nfs mounted fs's but they should have been umounted prior to
> shutting down the zone.  In any event I can no longer get into the
> zone to checkusing  zlogin and zlogin -C.
> 
> I tried Bryan's suggestion on looking for processes that might have
> open filehandles to files under the zone's filesystem tree but I don't
> see that there are any.
> 
> On Tue, Apr 28, 2009 at 3:40 PM, Bryan Allen 
> wrote:
> 
> +--
> | On 2009-04-28 15:37:22, Derek McEachern wrote:
> |
> | We were trying to bring down a zone on a S10 U4 system and
> it ended up stuck
> | in the shutting_down state.
> |
> | ID NAME STATUS PATH
> BRANDIP
> | 74 zonetest-new shutting_down /zone/zonetest-new
> native
> | shared
> |
> |
> | The only process I see running is the zoneadmd process
> |
> | dlet15:/home/derekm/ ps -efZ | grep zonetest-new
> |   globalroot 12680 1   0   Apr 24 ?   0:02
> zoneadmd -z
> | zonetest-new
> 
> 
> 
> Do any processes (notably shells in the global zones) have an
> open filehandle
> somewhere under the zone's filesystem tree? This can (at least
> on Sol10) cause
> zones to not shut down, since it can't close the FH (I assume,
> anyway).
> --
> bda
> cyberpunk is dead. long live cyberpunk.
> ___
> zones-discuss mailing list
> zones-discuss@opensolaris.org
> 
> ___
> zones-discuss mailing list
> zones-discuss@opensolaris.org

___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Zone Stuck in a shutting_down state

2009-04-28 Thread Ben Rockwood
Derek McEachern wrote:
> It's possible that it could be nfs mount related since the zone did
> have nfs mounted fs's but they should have been umounted prior to
> shutting down the zone.  In any event I can no longer get into the
> zone to checkusing  zlogin and zlogin -C.
>
> I tried Bryan's suggestion on looking for processes that might have
> open filehandles to files under the zone's filesystem tree but I don't
> see that there are any.

When you get in that situation all you can do is try to manually kill
everything (kill -9) and hope that it then shuts down.  Typically you'll
find one process you can't kill or do anything with, in which case you
have to reboot the global zone.

It sucks, been there many times.

benr.
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Zone Stuck in a shutting_down state

2009-04-28 Thread Derek McEachern
It's possible that it could be nfs mount related since the zone did have nfs
mounted fs's but they should have been umounted prior to shutting down the
zone.  In any event I can no longer get into the zone to checkusing  zlogin
and zlogin -C.

I tried Bryan's suggestion on looking for processes that might have open
filehandles to files under the zone's filesystem tree but I don't see that
there are any.

On Tue, Apr 28, 2009 at 3:40 PM, Bryan Allen  wrote:

>
> +--
> | On 2009-04-28 15:37:22, Derek McEachern wrote:
> |
> | We were trying to bring down a zone on a S10 U4 system and it ended up
> stuck
> | in the shutting_down state.
> |
> | ID NAME STATUS PATH   BRANDIP
> | 74 zonetest-new shutting_down /zone/zonetest-new native
> | shared
> |
> |
> | The only process I see running is the zoneadmd process
> |
> | dlet15:/home/derekm/ ps -efZ | grep zonetest-new
> |   globalroot 12680 1   0   Apr 24 ?   0:02 zoneadmd -z
> | zonetest-new
>
>
> Do any processes (notably shells in the global zones) have an open
> filehandle
> somewhere under the zone's filesystem tree? This can (at least on Sol10)
> cause
> zones to not shut down, since it can't close the FH (I assume, anyway).
> --
> bda
> cyberpunk is dead. long live cyberpunk.
> ___
> zones-discuss mailing list
> zones-discuss@opensolaris.org
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Zone Stuck in a shutting_down state

2009-04-28 Thread Bryan Allen
+--
| On 2009-04-28 15:37:22, Derek McEachern wrote:
| 
| We were trying to bring down a zone on a S10 U4 system and it ended up stuck
| in the shutting_down state.
| 
| ID NAME STATUS PATH   BRANDIP
| 74 zonetest-new shutting_down /zone/zonetest-new native
| shared
| 
| 
| The only process I see running is the zoneadmd process
| 
| dlet15:/home/derekm/ ps -efZ | grep zonetest-new
|   globalroot 12680 1   0   Apr 24 ?   0:02 zoneadmd -z
| zonetest-new


Do any processes (notably shells in the global zones) have an open filehandle
somewhere under the zone's filesystem tree? This can (at least on Sol10) cause
zones to not shut down, since it can't close the FH (I assume, anyway).
-- 
bda
cyberpunk is dead. long live cyberpunk.
___
zones-discuss mailing list
zones-discuss@opensolaris.org