Hi, I've already got some fruitful exchanges offline.
To share our product requirement, a summary. One important point in the design is that it has to take into account HA embedded system based on diskless nodes. The general constraints are: - these nodes are installed completely from the server, they are booted but never shutdown. They restart only after a crash. - no initial reboot even after the installation (the system is installed with the application and all start without intermediate reboot meaning that the boot archive will never be sync'ed). It means that checking is a must-have for normal systems but a solution to prevent it to block the boot is required for embedded one: there is no operator and sometimes no console. Then an additional property for example to bypass the checking could be a good solution (or a way to disable the service without blocking the boot). We have other means to detect inconsistency in our product. As there is no operator/administrator, all these checking are sending remote notifications. In some cases, our customers privilege the availability versus the integrity. The risk regarding file breaking is minor because these nodes are in fully controlled environment and nothing is supposed to change (no new driver nor system files modifications). And thanks again for this fruitful discussion. Nicolas Williams wrote On 02/15/06 19:23,: > On Wed, Feb 15, 2006 at 09:16:02AM -0800, Shudong Zhou wrote: > >>The basic idea of booting from an archive solves more problems then >>it introduces. It simplifies Xen, GRID deployment, HoneyComb, etc. >>As far as resyncing the archive on shutdown, it's the minimal >>approach. We can always add more syncing points where it makes >>sense. >> >>I do think the boot-archive is overly agressive in reporting a fatal >>failure if *any* file is out of date. The original intention was >>to prevent potential data corruption related to mismatched kernel >>modules. We should relax the failure mode such that fatal failure >>is only reported if a selected subset of files are out of date. > > > I filed > > 6342731 /lib/svc/method/boot-archive could try harder > > a while back. > > Possible approaches to fixing this problem: > > - make the boot-archive smarter (see 6342731) > > By the time the boot-archive method starts the root filesystem is > clearly mounted, so the boot-archive could figure out what actually > differs between / and the boot archive and act accordingly (e.g., > load/unload drivers, whatever). > > The only time that user input should be needed, if then, would be > when drivers needed for mounting / have changed. And even then, why > not just update the archive and reboot? / did mount, after all... > > This assumes, of course, that the / device isn't changed without > updating the archive. Mostly, I think, a safe assumption. > > In any case, whenever user input is needed it'd be nice if the > service could interact directly rather than offer some advice and > then dump the user on sulogin (ouch!). If sulogin took a command as > an argument then at least the method could dump the user on a menu > automatically as soon as the root password is typed in. > > > - minimize the boot archive -- it should have only enough to mount the > root, so if I add a sound driver or what have you, the boot archive > should need no updates. > > I really like this one, because how often do boot devices and drivers > change? Not often. But new devices are added rather often. > > This would require that the kernel be able to process /etc/system, > /etc/path_to_inst and /etc/driver_* twice: once from the boot > archive, once when the root is mounted. > > > Nico -- /\ Laurent FAIPOT \\ \ \ \\ / ---------------------------------------------------- / \/ / / International Center for Network Computing / / \//\ Sun Microsystems \//\ / / 180, Avenue de l'Europe - Zirst de Montbonnot / / /\ / 38334 SAINT-ISMIER CEDEX / \\ \ Phone : +33 (0)4 76 18 80 81 or x38081 \ \\ Fax : +33 (0)4 76 18 88 88 \/ Email : Laurent.Faipot at Sun.COM