[on-discuss] Reliability at power failure?

Uwe Dippel Wed, 25 Mar 2009 15:18:19 +0800

(I was directed to here, from ZFS, because the problems was identified
to be not based on ZFS, rather on the boot archive:
http://www.opensolaris.org/jive/thread.jspa?threadID=98092&tstart=0)


We have around 1 outage per week, in average, and the
machine(s) don't boot up as one might expect.
Just today: reboot, and rebooting in circles; with no chance on my side
to see the 30-40 lines of hex-stuff before the boot process recycles.
That's already bad.
So, let's try failsafe (all on nv_110). No better:
"Configuring /dev
relocation error: R_AMD64_PC32: file /kernel/dev/amd64/zfs: symbol
down_object_opo_relocate failed [not fully correctly noted on my side]
zfs error doing relocations
Searching for installed OS instances ...
/sbin/install-recovery[7]: 72 segmentation Fault
no installed OS instance found.
Starting shell."
init 6 brought back the failsafe, and there a boot archive was noted as
damaged, and could be repaired, and the machine restarted after another
init 6.
At earlier boot failures after a power outage, the behaviour was
different, but the boot archive was recognized as inconsistent a handful
of times. This bugs me. Otherwise, the machines run through without
trouble, and with ZFS, the chances for a damaged boot archive should be
zero. Here it approaches a two-digit percentage.

It was pointed out to me, that the problem was a corruption of the
boot archive by a third party driver.


My questions/suggestions are:

Ought boot archive not be an independent process, that creates a
proper backup in case of any modification, from any stupid handling?
Should a recycling reboot not be noted, if just by a flag (in case we
have r/w of a drive), including a redirection of the messages into a
file?
Should we not keep track of a proper roll-back point to offer to boot
to in case of failing/recycling boots? Maybe something like 'last
successful boot'?

Uwe

[on-discuss] Reliability at power failure?

Reply via email to