(I was directed to here, from ZFS, because the problems was identified to be not based on ZFS, rather on the boot archive: http://www.opensolaris.org/jive/thread.jspa?threadID=98092&tstart=0)
We have around 1 outage per week, in average, and the machine(s) don't boot up as one might expect. Just today: reboot, and rebooting in circles; with no chance on my side to see the 30-40 lines of hex-stuff before the boot process recycles. That's already bad. So, let's try failsafe (all on nv_110). No better: "Configuring /dev relocation error: R_AMD64_PC32: file /kernel/dev/amd64/zfs: symbol down_object_opo_relocate failed [not fully correctly noted on my side] zfs error doing relocations Searching for installed OS instances ... /sbin/install-recovery[7]: 72 segmentation Fault no installed OS instance found. Starting shell." init 6 brought back the failsafe, and there a boot archive was noted as damaged, and could be repaired, and the machine restarted after another init 6. At earlier boot failures after a power outage, the behaviour was different, but the boot archive was recognized as inconsistent a handful of times. This bugs me. Otherwise, the machines run through without trouble, and with ZFS, the chances for a damaged boot archive should be zero. Here it approaches a two-digit percentage. It was pointed out to me, that the problem was a corruption of the boot archive by a third party driver. My questions/suggestions are: Ought boot archive not be an independent process, that creates a proper backup in case of any modification, from any stupid handling? Should a recycling reboot not be noted, if just by a flag (in case we have r/w of a drive), including a redirection of the messages into a file? Should we not keep track of a proper roll-back point to offer to boot to in case of failing/recycling boots? Maybe something like 'last successful boot'? Uwe
