Re: [zones-discuss] Preventing a Local zone hang from having to reboot the global zone
Jeff Victor writes: > I wonder what would happen if you killed all of the zone's processes with > "pkill -9 -z " and then force the zone to re-boot with zoneadm. The > process in biowait would still be there, but if the zone can re-boot, that > may > not matter. That wouldn't help. As long as there's a process around holding the zone active, the shutdown won't complete and you won't be able to reboot the zone. In this case, if you're stuck unkillably somewhere in the kernel, then the stuck thread will hold a cred_t that holds a reference to the zone_t -- preventing it from finishing. That's a driver bug that has to be fixed, not something that can be worked around. -- James Carlson, Solaris Networking <[EMAIL PROTECTED]> Sun Microsystems / 1 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677 ___ zones-discuss mailing list [EMAIL PROTECTED]
Re: [zones-discuss] Preventing a Local zone hang from having to reboot the global zone
I wonder what would happen if you killed all of the zone's processes with "pkill -9 -z " and then force the zone to re-boot with zoneadm. The process in biowait would still be there, but if the zone can re-boot, that may not matter. Victor wrote: All, The zone halt, reboot, init 0 etc did not work from the global zone, we did capture a live core from the system and saw that the biowait was happening for 1 day. My questions is the following. We saw that in an event when a process within a zone hangs on a biowait, or anythings as such, ie writing to disk, etc, the zone or any process in the zone hangs, we were not able to bring the zone down, halt it etc and thus we had to reboot the global zone to fix the issue. My question is what can be done to prevent this, is it to unmount the disk assigned to the local zone? etc, All suggestions are welcomed. Thanks, Morillo, Carlos [NCSUS Non-J&J] wrote: Since you only have one single kernel image the thread the zone created is stuck doing a read()/write() in biowait() waiting for the I/O to complete. Sometime in the future the underlying device handling that I/O should respond and trigger an interrupt calling biodone(). I would look in /var/adm/messages and have a crashdump of the kernel if it was collected. Keep in mind there is one single kernel image. HTH, Carlos A. Morillo 908 655 3378 NA UNIX Deployment [EMAIL PROTECTED] J&J NCS Raritan, NJ -Original Message- *From:* [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of *Russ Petruzzelli *Sent:* Wednesday, April 11, 2007 12:30 PM *To:* [EMAIL PROTECTED] *Cc:* zones-discuss@opensolaris.org *Subject:* Re: [zones-discuss] Preventing a Local zone hang from having to reboot the global zone This will re-boot the zone from the global zone, zoneadm -z hostnamezoneX boot Russ Victor Restrepo wrote On 04/11/07 07:40 AM,: Hello All, I had a case where a hung process in a local zone (ie blocked on biowait), caused the local zone to become unresponsive. Couldn't shut the zone down, kill the process, etc. The only way to have zone become responsive was to reboot the global zone. Is there any thing that can be done to prevent this. ie (not having to reboot the global zone and only reboot the local zone) Regards, -- -- Jeff VICTOR Sun Microsystemsjeff.victor @ sun.com OS AmbassadorSr. Technical Specialist Solaris 10 Zones FAQ:http://www.opensolaris.org/os/community/zones/faq -- ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Preventing a Local zone hang from having to reboot the global zone
All, The zone halt, reboot, init 0 etc did not work from the global zone, we did capture a live core from the system and saw that the biowait was happening for 1 day. My questions is the following. We saw that in an event when a process within a zone hangs on a biowait, or anythings as such, ie writing to disk, etc, the zone or any process in the zone hangs, we were not able to bring the zone down, halt it etc and thus we had to reboot the global zone to fix the issue. My question is what can be done to prevent this, is it to unmount the disk assigned to the local zone? etc, All suggestions are welcomed. Thanks, Morillo, Carlos [NCSUS Non-J&J] wrote: Since you only have one single kernel image the thread the zone created is stuck doing a read()/write() in biowait() waiting for the I/O to complete. Sometime in the future the underlying device handling that I/O should respond and trigger an interrupt calling biodone(). I would look in /var/adm/messages and have a crashdump of the kernel if it was collected. Keep in mind there is one single kernel image. HTH, Carlos A. Morillo 908 655 3378 NA UNIX Deployment [EMAIL PROTECTED] J&J NCS Raritan, NJ -Original Message- *From:* [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of *Russ Petruzzelli *Sent:* Wednesday, April 11, 2007 12:30 PM *To:* [EMAIL PROTECTED] *Cc:* zones-discuss@opensolaris.org *Subject:* Re: [zones-discuss] Preventing a Local zone hang from having to reboot the global zone This will re-boot the zone from the global zone, zoneadm -z hostnamezoneX boot Russ Victor Restrepo wrote On 04/11/07 07:40 AM,: Hello All, I had a case where a hung process in a local zone (ie blocked on biowait), caused the local zone to become unresponsive. Couldn't shut the zone down, kill the process, etc. The only way to have zone become responsive was to reboot the global zone. Is there any thing that can be done to prevent this. ie (not having to reboot the global zone and only reboot the local zone) Regards, ___ zones-discuss mailing list zones-discuss@opensolaris.org -- Victor Restrepo Technical Specialist Sun Microsystems ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Preventing a Local zone hang from having to reboot the global zone
This will re-boot the zone from the global zone, zoneadm -z hostnamezoneX boot Russ Victor Restrepo wrote On 04/11/07 07:40 AM,: >Hello All, > >I had a case where a hung process in a local zone (ie blocked on >biowait), caused the local zone to become unresponsive. Couldn't shut >the zone down, kill the process, etc. The only way to have zone become >responsive was to reboot the global zone. Is there any thing that can >be done to prevent this. ie (not having to reboot the global zone and >only reboot the local zone) > >Regards, > > ___ zones-discuss mailing list zones-discuss@opensolaris.org