[zones-discuss] All zones continuously core dump after upgrade to Solaris Express

2010-11-18 Thread Ian Collins
I run through the upgrade process on a system with half a dozen zones 
and on restart, they all get locked into a core dump/restart loop:


Nov 19 07:57:50 i7 genunix: [ID 729207 kern.warning] WARNING: init(1M) 
for zone webhost (pid 3094) core dumped on signal 12: restarting 
automatically


They all run through this cycle in tight loops.

Oops.

--
Ian.

___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] All zones continuously core dump after upgrade to Solaris Express

2010-11-18 Thread John D Groenveld
In message 4ce57afe.9070...@ianshome.com, Ian Collins writes:
I run through the upgrade process on a system with half a dozen zones 
and on restart, they all get locked into a core dump/restart loop:

Nov 19 07:57:50 i7 genunix: [ID 729207 kern.warning] WARNING: init(1M) 
for zone webhost (pid 3094) core dumped on signal 12: restarting 
automatically

They all run through this cycle in tight loops.

I saw this on one Express upgrade.

I usually halt, detach, image-update, and attach -u, but on my failed
update I neglected to detach the zone. Whoops.

I halted the zone, detached, and after some failed attempts to attach
with zoneadm discovered that there was ZFS clone of the zone's zbe.

I performed a zfs send -R of the source snapshot, destroyed the
source ZFS and the dependant clone, and restored the original zbe.

I was able to get the attach -u to subsequently worked.

Also, I sacrificed a chicken but not sure whether that helped.
John
groenv...@acm.org
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] All zones continuously core dump after upgrade to Solaris Express

2010-11-18 Thread Ian Collins

On 11/19/10 08:26 AM, John D Groenveld wrote:

In message4ce57afe.9070...@ianshome.com, Ian Collins writes:
   

I run through the upgrade process on a system with half a dozen zones
and on restart, they all get locked into a core dump/restart loop:

Nov 19 07:57:50 i7 genunix: [ID 729207 kern.warning] WARNING: init(1M)
for zone webhost (pid 3094) core dumped on signal 12: restarting
automatically

They all run through this cycle in tight loops.
 

I saw this on one Express upgrade.

I usually halt, detach, image-update, and attach -u, but on my failed
update I neglected to detach the zone. Whoops.

I halted the zone, detached, and after some failed attempts to attach
with zoneadm discovered that there was ZFS clone of the zone's zbe.

I performed a zfs send -R of the source snapshot, destroyed the
source ZFS and the dependant clone, and restored the original zbe.

I was able to get the attach -u to subsequently worked.

Also, I sacrificed a chicken but not sure whether that helped.
   


Well that's me buggered, I don't have any on hand!

I'm guessing this is a manifestation of the issue Zones Cloned by Using 
zoneadm clone Can Cause a Snapshot Name Collision When You Activate a 
Boot Environment (10990) mentioned in the release notes.


I had assumed I wouldn't have this problem because I wasn't upgrading 
from 2009.06 and I haven't (consciously) used zoneadm clone.


--
Ian.

___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] All zones continuously core dump after upgrade to Solaris Express

2010-11-18 Thread Ian Collins

On 11/19/10 09:12 AM, Steve Lawrence wrote:

 What build are you upgrading from?


134 through 134b as recommended in the release notes.


Is this during the attach -u portion of the upgrade for each zone?

It happens after rebooting into the new BE.  I didn't detach the zones 
before upgrading.


Can you gather any core files (or pstacks of core files)?  These might 
be at zonepath/root/



pstack is short:

core '/tmp/xx/zoneRoot/webhost/root/core' of 3094:/sbin/init
 feef3c97 _fxstat  (0, 8047560, 180, 8058927) + 7
 08058973 st_init  (fee201a8, 38, 0, fefccc54, 0, feffb804) + 8f
 080543dc main (1, 8047f6c, 8047f74, feffb804) + 150
 0805418d _start   (1, 8047fe0, 0, 0, 7d8, 8047feb) + 7d

I can send the core (it's only 2MB) if that helps.

My guess is that init (in the zone) is starting using a downrev libc 
(aka libc not upgraded yet), and is making a system call that has 
changed.  12 is SIGSYS.


-Steve

On 11/18/10 11:14 AM, Ian Collins wrote:
I run through the upgrade process on a system with half a dozen zones 
and on restart, they all get locked into a core dump/restart loop:


Nov 19 07:57:50 i7 genunix: [ID 729207 kern.warning] WARNING: 
init(1M) for zone webhost (pid 3094) core dumped on signal 12: 
restarting automatically


They all run through this cycle in tight loops.

Oops.


___
zones-discuss mailing list
zones-discuss@opensolaris.org




--
Ian.

___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] All zones continuously core dump after upgrade to Solaris Express

2010-11-18 Thread Steve Lawrence



On 11/18/10 12:38 PM, Ian Collins wrote:

On 11/19/10 09:12 AM, Steve Lawrence wrote:

 What build are you upgrading from?


134 through 134b as recommended in the release notes.


Is this during the attach -u portion of the upgrade for each zone?

It happens after rebooting into the new BE.  I didn't detach the zones 
before upgrading.


Oh.  I that case, you're zones are still downrev at build 134.  You need 
to detach them, and attach them again with -u.


I'm not sure if you'll be able to detach them successfully with zoneadm 
detach.  If not, you'll need to boot back to the 134 BE, detach them, 
and upgrade again.


-Steve

Can you gather any core files (or pstacks of core files)?  These 
might be at zonepath/root/



pstack is short:

core '/tmp/xx/zoneRoot/webhost/root/core' of 3094:/sbin/init
 feef3c97 _fxstat  (0, 8047560, 180, 8058927) + 7
 08058973 st_init  (fee201a8, 38, 0, fefccc54, 0, feffb804) + 8f
 080543dc main (1, 8047f6c, 8047f74, feffb804) + 150
 0805418d _start   (1, 8047fe0, 0, 0, 7d8, 8047feb) + 7d

I can send the core (it's only 2MB) if that helps.

My guess is that init (in the zone) is starting using a downrev 
libc (aka libc not upgraded yet), and is making a system call that 
has changed.  12 is SIGSYS.


-Steve

On 11/18/10 11:14 AM, Ian Collins wrote:
I run through the upgrade process on a system with half a dozen 
zones and on restart, they all get locked into a core dump/restart 
loop:


Nov 19 07:57:50 i7 genunix: [ID 729207 kern.warning] WARNING: 
init(1M) for zone webhost (pid 3094) core dumped on signal 12: 
restarting automatically


They all run through this cycle in tight loops.

Oops.


___
zones-discuss mailing list
zones-discuss@opensolaris.org





___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] All zones continuously core dump after upgrade to Solaris Express

2010-11-18 Thread Ian Collins

On 11/19/10 10:11 AM, Steve Lawrence wrote:



On 11/18/10 12:38 PM, Ian Collins wrote:

On 11/19/10 09:12 AM, Steve Lawrence wrote:

 What build are you upgrading from?


134 through 134b as recommended in the release notes.


Is this during the attach -u portion of the upgrade for each zone?

It happens after rebooting into the new BE.  I didn't detach the 
zones before upgrading.


Oh.  I that case, you're zones are still downrev at build 134.  You 
need to detach them, and attach them again with -u.


I'm not sure if you'll be able to detach them successfully with 
zoneadm detach.  If not, you'll need to boot back to the 134 BE, 
detach them, and upgrade again.




OK, thanks Steve.

This should be made clear in the release notes.  The current note isn't 
strong enough.


--
Ian.

___
zones-discuss mailing list
zones-discuss@opensolaris.org