2011-10-12 11:56, Frank Van Damme пишет:

The root of the problem seems to be that that process never completes.

9 /lib/svc/bin/svc.startd
332 /sbin/sh /lib/svc/method/boot-archive-update
347 /sbin/bootadm update-archive

Can't kill it and run from the cmdline either, it simply ignores SIGKILL. (Which shouldn't even be possible).


I guess it is possible when things lock up in kernel calls, waiting for them to complete. It has happened on me a number of times, usually related to ZFS pool being too busy working or repairing to do anything else, and this per se often lead to system crashing (see i.e. my adventures this spring reported on the forums). I had hit a number of problems generally leading to the whole zfs subsystem "running away to a happy place".

As an indication of this you can try running something as simple as "zpool list" in the background (otherwise your shell locks up too) and see if it ever completes:

# zpool list &

Earlier there were bugs related to inaccessible snapshots (marked for deletion, but not actually deletable until you mount and unmount the parent dataset) - these mostly fired in zfs-auto-snap auto-deletions, but also happened to influence bootadm.

I am not sure in what way bootadm relies on zfs/zpool, but empirically - it does.
You might work around the problem by:
* exporting "data" zfs pools before updating the bootarchive (bootadm update-archive); if you're rebooting the system anyway - stop the zones and services manually, and give this a try. * booting from another media like a Failsafe Boot (SXCE, Sol10) or LiveCD (Indiana) and importing your rootpool to "/a", then run
# bootadm update-archive -R /a
* booting into single-user mode, making the root RW if needed, and updating the archive. ** You're likely to go this way anyway if your boot is interrupted due to an outdated boot archive (SMF failure - requires a repair shell interaction). When the archive is updated, you need to clear the service (svcadm clear boot-archive) and exit the repair shell in order to continue booting the OS. * brute force - updating the bootarchive (/platform/i86pc/boot_archive and /platform/i86pc/amd64/boot_archive ) manually as an FS image, with files listed in /boot/solaris/filelist.ramdisk. Usually failure on boot is related to updating of some config files in /etc...

//Jim

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to