On Wed, 23 Dec 2009 15:26:17 +0100, Glenn Brunette <glenn.brune...@sun.com> 
wrote:

> Just verified that something is still wrong in b129, but the problem is
> _not_ with a vanilla configuration.  This time around boot/halt #102,
> the system apparently shutdown/panic'ed?  I was running it overnight
> and came in to a system that had been rebooted.  I did not see any
> problem in the audit log nor in /var/adm/messages.  Any pointers?
>
> I am running an Immutable Service Container configuration, based upon
> the installation steps at:
>
> http://kenai.com/projects/isc/pages/OpenSolaris
>
> Specifically:
>
> pfexec pkg install SUNWmercurial
> hg clone https://kenai.com/hg/isc~source  isc
> pfexec isc/bin/iscadm.ksh -N 0
> pfexec bootadm update-archive
> pfexec shutdown -g 0 -i 0 -y
> [after reboot]
> zlogin -C isc1
> [wait for zone isc1 to fully complete boot process]
>
> then run the script that I provided that stops and starts the zone.
>
> Apparently, there must be something wrong with the interaction of
> components.  In this configuration, we have things like resource
> controls, auditing, IP Filter/IP NAT, and zones all enabled.
>
> Would it be possible for you to try the steps above on a fresh
> install of 2009.06 or later (b129 is where I am right now).  Also,
> if you have other debugging methods, please let me know.

hey Glenn, the good news is that I have an OSOL_130 system with ISC installed
as described below that reliably reproduces _something_.

That something being the system completely hung when run your script:

batsc...@osol:~# while : ; do echo "`date`:ZONE BOOT"; pfexec zoneadm -z isc1 
boot; sleep 10; echo "`date`: ZONE HALT"; pfexec zoneadm -z isc1 halt; sleep 
10; done

Note, sleep 30 didn't do it, 17 hours running without an issue, however changing
this to sleep 10, I can reliably hang the system usually within 5 hours.

no remote access possibly anymore and even local console doesn't do it anymore.

F1-A taking a dump when booted into kmdb however works.

the bad news is, I'm not getting the dumps, sigh. this is due to bug:

6911155 kernel dump fails if panic happens in interrupt service routine

which is fixed in build 131.

So I will persue this further once OSOL_131 has been released and this
system has been upgraded. I finally will have dumps by then.

I'll also contact you offline how you can setup your systems
to capture crash dumps and anything else we might need.

cheers
frankB
_______________________________________________
zones-discuss mailing list
zones-discuss@opensolaris.org

Reply via email to