Re: [zones-discuss] Preventing a Local zone hang from having to reboot the global zone

2007-04-12 Thread James Carlson
Jeff Victor writes:
> I wonder what would happen if you killed all of the zone's processes with 
> "pkill -9 -z " and then force the zone to re-boot with zoneadm. The 
> process in biowait would still be there, but if the zone can re-boot, that 
> may 
> not matter.

That wouldn't help.

As long as there's a process around holding the zone active, the
shutdown won't complete and you won't be able to reboot the zone.  In
this case, if you're stuck unkillably somewhere in the kernel, then
the stuck thread will hold a cred_t that holds a reference to the
zone_t -- preventing it from finishing.

That's a driver bug that has to be fixed, not something that can be
worked around.

-- 
James Carlson, Solaris Networking  <[EMAIL PROTECTED]>
Sun Microsystems / 1 Network Drive 71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677
___
zones-discuss mailing list
[EMAIL PROTECTED]


Re: [zones-discuss] Preventing a Local zone hang from having to reboot the global zone

2007-04-11 Thread Jeff Victor
I wonder what would happen if you killed all of the zone's processes with 
"pkill -9 -z " and then force the zone to re-boot with zoneadm. The 
process in biowait would still be there, but if the zone can re-boot, that may 
not matter.


Victor wrote:

All,
The zone halt, reboot, init 0 etc did not work from the global zone, we 
did capture a live core from the system and saw that the biowait was 
happening for 1 day.  My questions is the following.  We saw that in an 
event when a process within a zone hangs on a biowait, or anythings as 
such, ie writing to disk, etc, the zone or any process in the zone 
hangs, we were not able to bring the zone down, halt it etc and thus we 
had to reboot the global zone to fix the issue.  My question is what can 
be done to prevent this, is it to unmount the disk assigned to the local 
zone? etc, All suggestions are welcomed.


Thanks,

Morillo, Carlos [NCSUS Non-J&J] wrote:


Since you only have one single kernel image the thread the zone created
is stuck doing a read()/write()  in biowait() waiting for the I/O to 
complete.
 
Sometime in the future the underlying device handling that I/O should 
respond

and trigger an interrupt calling biodone().
 
I would look in /var/adm/messages and have a crashdump of the kernel 
if it

was collected.
 
Keep in mind there is one single kernel image.
 
 
HTH,
 


Carlos A. Morillo   908 655 3378
NA UNIX Deployment  [EMAIL PROTECTED]
J&J NCS Raritan, NJ

-Original Message-
*From:* [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of *Russ
Petruzzelli
*Sent:* Wednesday, April 11, 2007 12:30 PM
*To:* [EMAIL PROTECTED]
*Cc:* zones-discuss@opensolaris.org
    *Subject:* Re: [zones-discuss] Preventing a Local zone hang from
    having to reboot the global zone

This will re-boot the zone from the global zone,
zoneadm -z hostnamezoneX boot

Russ

Victor Restrepo wrote On 04/11/07 07:40 AM,:


Hello All,

I had a case where a hung process in a local zone (ie blocked on 
biowait), caused the local zone to become unresponsive. Couldn't shut 
the zone down, kill the process, etc.  The only way to have zone 
become responsive was to reboot the global zone.  Is there any thing 
that can be done to prevent this. ie (not having to reboot the global 
zone and only reboot the local zone)


Regards,
 



--
--
Jeff VICTOR  Sun Microsystemsjeff.victor @ sun.com
OS AmbassadorSr. Technical Specialist
Solaris 10 Zones FAQ:http://www.opensolaris.org/os/community/zones/faq
--
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Preventing a Local zone hang from having to reboot the global zone

2007-04-11 Thread Victor

All,
The zone halt, reboot, init 0 etc did not work from the global zone, we 
did capture a live core from the system and saw that the biowait was 
happening for 1 day.  My questions is the following.  We saw that in an 
event when a process within a zone hangs on a biowait, or anythings as 
such, ie writing to disk, etc, the zone or any process in the zone 
hangs, we were not able to bring the zone down, halt it etc and thus we 
had to reboot the global zone to fix the issue.  My question is what can 
be done to prevent this, is it to unmount the disk assigned to the local 
zone? etc, All suggestions are welcomed.


Thanks,

Morillo, Carlos [NCSUS Non-J&J] wrote:


Since you only have one single kernel image the thread the zone created
is stuck doing a read()/write()  in biowait() waiting for the I/O to 
complete.
 
Sometime in the future the underlying device handling that I/O should 
respond

and trigger an interrupt calling biodone().
 
I would look in /var/adm/messages and have a crashdump of the kernel if it

was collected.
 
Keep in mind there is one single kernel image.
 
 
HTH,
 


Carlos A. Morillo   908 655 3378
NA UNIX Deployment  [EMAIL PROTECTED]
J&J NCS Raritan, NJ

-Original Message-
*From:* [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of *Russ
Petruzzelli
*Sent:* Wednesday, April 11, 2007 12:30 PM
*To:* [EMAIL PROTECTED]
*Cc:* zones-discuss@opensolaris.org
*Subject:* Re: [zones-discuss] Preventing a Local zone hang from
    having to reboot the global zone

This will re-boot the zone from the global zone,
zoneadm -z hostnamezoneX boot

Russ

Victor Restrepo wrote On 04/11/07 07:40 AM,:


Hello All,

I had a case where a hung process in a local zone (ie blocked on 
biowait), caused the local zone to become unresponsive. Couldn't shut 
the zone down, kill the process, etc.  The only way to have zone become 
responsive was to reboot the global zone.  Is there any thing that can 
be done to prevent this. ie (not having to reboot the global zone and 
only reboot the local zone)


Regards,
 





___
zones-discuss mailing list
zones-discuss@opensolaris.org


--
Victor Restrepo

Technical Specialist
Sun Microsystems


___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Preventing a Local zone hang from having to reboot the global zone

2007-04-11 Thread Russ Petruzzelli
This will re-boot the zone from the global zone,
zoneadm -z hostnamezoneX boot

Russ

Victor Restrepo wrote On 04/11/07 07:40 AM,:

>Hello All,
>
>I had a case where a hung process in a local zone (ie blocked on 
>biowait), caused the local zone to become unresponsive. Couldn't shut 
>the zone down, kill the process, etc.  The only way to have zone become 
>responsive was to reboot the global zone.  Is there any thing that can 
>be done to prevent this. ie (not having to reboot the global zone and 
>only reboot the local zone)
>
>Regards,
>  
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org