On Wed, 23 Dec 2009 15:26:17 +0100, Glenn Brunette <glenn.brune...@sun.com> wrote:
> Just verified that something is still wrong in b129, but the problem is > _not_ with a vanilla configuration. This time around boot/halt #102, > the system apparently shutdown/panic'ed? I was running it overnight > and came in to a system that had been rebooted. I did not see any > problem in the audit log nor in /var/adm/messages. Any pointers? > > I am running an Immutable Service Container configuration, based upon > the installation steps at: > > http://kenai.com/projects/isc/pages/OpenSolaris > > Specifically: > > pfexec pkg install SUNWmercurial > hg clone https://kenai.com/hg/isc~source isc > pfexec isc/bin/iscadm.ksh -N 0 > pfexec bootadm update-archive > pfexec shutdown -g 0 -i 0 -y > [after reboot] > zlogin -C isc1 > [wait for zone isc1 to fully complete boot process] > > then run the script that I provided that stops and starts the zone. > > Apparently, there must be something wrong with the interaction of > components. In this configuration, we have things like resource > controls, auditing, IP Filter/IP NAT, and zones all enabled. > > Would it be possible for you to try the steps above on a fresh > install of 2009.06 or later (b129 is where I am right now). Also, > if you have other debugging methods, please let me know. hey Glenn, the good news is that I have an OSOL_130 system with ISC installed as described below that reliably reproduces _something_. That something being the system completely hung when run your script: batsc...@osol:~# while : ; do echo "`date`:ZONE BOOT"; pfexec zoneadm -z isc1 boot; sleep 10; echo "`date`: ZONE HALT"; pfexec zoneadm -z isc1 halt; sleep 10; done Note, sleep 30 didn't do it, 17 hours running without an issue, however changing this to sleep 10, I can reliably hang the system usually within 5 hours. no remote access possibly anymore and even local console doesn't do it anymore. F1-A taking a dump when booted into kmdb however works. the bad news is, I'm not getting the dumps, sigh. this is due to bug: 6911155 kernel dump fails if panic happens in interrupt service routine which is fixed in build 131. So I will persue this further once OSOL_131 has been released and this system has been upgraded. I finally will have dumps by then. I'll also contact you offline how you can setup your systems to capture crash dumps and anything else we might need. cheers frankB _______________________________________________ zones-discuss mailing list zones-discuss@opensolaris.org