Re: Boot process robustness

2001-01-07 Thread Daniel C. Sobral

John Baldwin wrote:
 
 /boot/loader.conf perhaps, but how does the loader know that the previous boot
 failed so that it knows to fall back?  This is much harder, as a failed kernel
 boot usually results in a hang or an instant CPU reset.

Loader sets a flag before booting, and the boot process resets it at the
end. Of course, loader doesn't have write capability.

-- 
Daniel C. Sobral(8-DCS)
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]

"There is no spoon." -- Kiki


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Boot process robustness

2001-01-02 Thread James Halstead


- Original Message -
From: "James Halstead" [EMAIL PROTECTED]
To: "Poul-Henning Kamp" [EMAIL PROTECTED]
Sent: Tuesday, January 02, 2001 11:30 PM
Subject: Re: Boot process robustness



 - Original Message -
 From: "Poul-Henning Kamp" [EMAIL PROTECTED]
 To: "Walter W. Hop" [EMAIL PROTECTED]
 Cc: "FreeBSD hackers" [EMAIL PROTECTED]
 Sent: Thursday, December 28, 2000 9:31 AM
 Subject: Re: Boot process robustness


  In message [EMAIL PROTECTED],
"Walter
 W. Ho
  p" writes:
  Hi all,
  
  I was wondering how to increase the robustness of the booting process,
  so that a box would be able to keep itself on its feet without
  intervention of the console. I think this would be of great value to
the
  many people who administer colocated boxes.
  
  I'm not much of a coder so all I can do is mailing this (at the risk of
  wasting your time with total useless crap ofcourse, in which case I
  apologize.)
  
  1. Old kernel recovery
 When 'make install'ing a new kernel, a flag is raised (say,
 'revert_on_fail') which is only cleared after a successful system
 initialisation. When the new kernel boots, a panic in this state or
 an unexpected reboot (reset after a system hang) would cause
 /kernel.old to be loaded on the next boot instead (maybe the same
 could work for /etc/rc.conf.old)
 
  This is actually more a question of where to store the flag than
  anything else.
 

 Couldn't you just modify the shutdown command to have an option for revert
 on fail, which would create
 a file on the root filesystem with a timestamp of when the reboot started.
 Then at boot time, if that timstamp
 is still there, and has been around for too long, boot the kernel.old
 instead of kernel. Then the question is
 what amount of time is reasonable for the wait period. This may have the
 machine boot the new kernel
 and panic a few times, but at least you can be assured that it would after
x
 minutes boot the old kernel
 instead. Once a boot was successful the times stamp file could be removed.

 Just a thought.

 ~James

  Julian made a rather hackish thing for Whistle, but I think we lost
  that with the advent of the new bootblocks.
 
  2. Automatic file system checks
 In case of a powercycle or crash, it could be that a filesystem
needs
 fixing. Now I don't know much about fs internals, but I guess that
in
 most cases just answering 'Y' to fsck's questions will fix things. I
 would appreciate an option where an inconsistency would start up
fsck
 in an "automatic" repair mode, with all actions logged and "undo"
 data being saved (in case manual review is needed).
 
  Alternatively it might be worth considering adding a
"remote-single-user"
  capability:
 
  If an fsck fails, ifconfig the interfaces and start an sshd so people
  can get in remotely and fsck...
 
  --
  Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
  [EMAIL PROTECTED] | TCP/IP since RFC 956
  FreeBSD committer   | BSD since 4.3-tahoe
  Never attribute to malice what can adequately be explained by
 incompetence.
 
 
  To Unsubscribe: send mail to [EMAIL PROTECTED]
  with "unsubscribe freebsd-hackers" in the body of the message



_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Boot process robustness

2001-01-01 Thread Leif Neland

   If an fsck fails, ifconfig the interfaces and start an sshd so
   people can get in remotely and fsck...
 
  What if an fsck on /usr fails?  Other than that, I love the idea!

 Force-mount it read-only if necessary, or simply copy a static sshd
 into /sbin.  Runnning fsck -y is the wrong solution, since if fsck
 can't fix an error automatically, something pretty bad has happened
 (physical media error, someone dd'ing onto the raw disk, etc)..

Even so, unless the machine contains invaluable data, I guess 99% still does
a fsck -y if fsck fails.
I'd rather have my remote boxes do that by themselves, and perhaps email me,
than I either have to drive there, or give somebody the root password, and
remote control that person to just do fsck -y.

In almost all cases, when a machine can't fsck itself after a power failure,
a fsck -y fixes it.
But then, most of the disk is either squid's cache, or unused stuff like
termcaps, kernel source, man pages etc. Most stuff is there just because it
could be handy one day, and it is not worth the trouble pruning it.

Leif





To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Boot process robustness

2000-12-30 Thread Julian Elischer

John Baldwin wrote:
 
 On 28-Dec-00 Poul-Henning Kamp wrote:
  In message [EMAIL PROTECTED], "Walter W.
  Ho
  p" writes:
 Hi all,
 
 I was wondering how to increase the robustness of the booting process,
 so that a box would be able to keep itself on its feet without
 intervention of the console. I think this would be of great value to the
 many people who administer colocated boxes.

the old 'nextboot(8)' system used to do this if you had the 'writeback'
option enabled..


it wrote a sequence of boot strings into block '1' (not 0)
and zero'd them out as it used them up.
/etc/rc would then be used to write an appropriate set of strings back
in the case of a successful boot.

This is used successfullly in the thousands of interjets out there in 
the field. Unfortunatly the new bootblock writers never considered 
this an important enough feature to emulate.
but it gives you a place to look.

We wrote a list of ever-increasingly conservative boot strings, eventually
even moving to an alternate root partition

 
 I'm not much of a coder so all I can do is mailing this (at the risk of
 wasting your time with total useless crap ofcourse, in which case I
 apologize.)
 
 1. Old kernel recovery
When 'make install'ing a new kernel, a flag is raised (say,
'revert_on_fail') which is only cleared after a successful system
initialisation. When the new kernel boots, a panic in this state or
an unexpected reboot (reset after a system hang) would cause
/kernel.old to be loaded on the next boot instead (maybe the same
could work for /etc/rc.conf.old)
 
  This is actually more a question of where to store the flag than
  anything else.
 
 /boot/loader.conf perhaps, but how does the loader know that the previous boot
 failed so that it knows to fall back?  This is much harder, as a failed kernel
 boot usually results in a hang or an instant CPU reset.
 
 --
 
 John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
 PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
 "Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with "unsubscribe freebsd-hackers" in the body of the message

-- 
  __--_|\  Julian Elischer
 /   \ [EMAIL PROTECTED]
(   OZ) World tour 2000
--- X_.---._/  from Perth, presently in:  Budapest
v


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Boot process robustness

2000-12-30 Thread Mike Smith

 This is used successfullly in the thousands of interjets out there in 
 the field. Unfortunatly the new bootblock writers never considered 
 this an important enough feature to emulate.
 but it gives you a place to look.

One of the "new bootblock authors" actually commented on this thread, 
including some of the reasons why this approach wasn't taken...

-- 
... every activity meets with opposition, everyone who acts has his
rivals and unfortunately opponents also.  But not because people want
to be opponents, rather because the tasks and relationships force
people to take different points of view.  [Dr. Fritz Todt]
   V I C T O R Y   N O T   V E N G E A N C E




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Boot process robustness

2000-12-30 Thread Mike Smith

 the current ideas all fail badly in the face of 'a' partition filesystem
 corruption.

Very true.  I also looked at the CMOS scratchpad registers...

-- 
... every activity meets with opposition, everyone who acts has his
rivals and unfortunately opponents also.  But not because people want
to be opponents, rather because the tasks and relationships force
people to take different points of view.  [Dr. Fritz Todt]
   V I C T O R Y   N O T   V E N G E A N C E




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Boot process robustness

2000-12-29 Thread John Baldwin


On 28-Dec-00 Poul-Henning Kamp wrote:
 In message [EMAIL PROTECTED], "Walter W.
 Ho
 p" writes:
Hi all,

I was wondering how to increase the robustness of the booting process,
so that a box would be able to keep itself on its feet without
intervention of the console. I think this would be of great value to the
many people who administer colocated boxes.

I'm not much of a coder so all I can do is mailing this (at the risk of
wasting your time with total useless crap ofcourse, in which case I
apologize.)

1. Old kernel recovery
   When 'make install'ing a new kernel, a flag is raised (say,
   'revert_on_fail') which is only cleared after a successful system
   initialisation. When the new kernel boots, a panic in this state or
   an unexpected reboot (reset after a system hang) would cause
   /kernel.old to be loaded on the next boot instead (maybe the same
   could work for /etc/rc.conf.old)
 
 This is actually more a question of where to store the flag than
 anything else.

/boot/loader.conf perhaps, but how does the loader know that the previous boot
failed so that it knows to fall back?  This is much harder, as a failed kernel
boot usually results in a hang or an instant CPU reset.

-- 

John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Boot process robustness

2000-12-29 Thread Mike Smith

  This is actually more a question of where to store the flag than
  anything else.
 
 /boot/loader.conf perhaps, but how does the loader know that the previous boot
 failed so that it knows to fall back?  This is much harder, as a failed kernel
 boot usually results in a hang or an instant CPU reset.

I had always planned to write a fixed-size file to disk (probably 512 
bytes) and then implement "overwrite only" write support in the various 
filesystems to allow us to use it as a "persistent" environment store, 
eg. have a 'save foo' keyword which would update the persistent store 
with the 'foo' variable.  This would avoid the bloat that block 
allocation, directory creation etc. would entail with "real" write 
support, whilst allowing us most of the desirable features.

All of the primary boot filesystems (ffs, nfs, tftp, fat, ext2) could 
handle this with trivial modifications.

-- 
... every activity meets with opposition, everyone who acts has his
rivals and unfortunately opponents also.  But not because people want
to be opponents, rather because the tasks and relationships force
people to take different points of view.  [Dr. Fritz Todt]
   V I C T O R Y   N O T   V E N G E A N C E




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Boot process robustness

2000-12-28 Thread Walter W. Hop

Hi all,

I was wondering how to increase the robustness of the booting process,
so that a box would be able to keep itself on its feet without
intervention of the console. I think this would be of great value to the
many people who administer colocated boxes.

I'm not much of a coder so all I can do is mailing this (at the risk of
wasting your time with total useless crap ofcourse, in which case I
apologize.)

1. Old kernel recovery
   When 'make install'ing a new kernel, a flag is raised (say,
   'revert_on_fail') which is only cleared after a successful system
   initialisation. When the new kernel boots, a panic in this state or
   an unexpected reboot (reset after a system hang) would cause
   /kernel.old to be loaded on the next boot instead (maybe the same
   could work for /etc/rc.conf.old)

2. Automatic file system checks
   In case of a powercycle or crash, it could be that a filesystem needs
   fixing. Now I don't know much about fs internals, but I guess that in
   most cases just answering 'Y' to fsck's questions will fix things. I
   would appreciate an option where an inconsistency would start up fsck
   in an "automatic" repair mode, with all actions logged and "undo"
   data being saved (in case manual review is needed).

There!
(Merry etc etc, by the way!)

walter

-- 
 Walter W. Hop [EMAIL PROTECTED] | +31 6 24290808 | PGP key: 0xD4DD8DEB




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Boot process robustness

2000-12-28 Thread Poul-Henning Kamp

In message [EMAIL PROTECTED], "Walter W. Ho
p" writes:
Hi all,

I was wondering how to increase the robustness of the booting process,
so that a box would be able to keep itself on its feet without
intervention of the console. I think this would be of great value to the
many people who administer colocated boxes.

I'm not much of a coder so all I can do is mailing this (at the risk of
wasting your time with total useless crap ofcourse, in which case I
apologize.)

1. Old kernel recovery
   When 'make install'ing a new kernel, a flag is raised (say,
   'revert_on_fail') which is only cleared after a successful system
   initialisation. When the new kernel boots, a panic in this state or
   an unexpected reboot (reset after a system hang) would cause
   /kernel.old to be loaded on the next boot instead (maybe the same
   could work for /etc/rc.conf.old)

This is actually more a question of where to store the flag than
anything else.

Julian made a rather hackish thing for Whistle, but I think we lost
that with the advent of the new bootblocks.

2. Automatic file system checks
   In case of a powercycle or crash, it could be that a filesystem needs
   fixing. Now I don't know much about fs internals, but I guess that in
   most cases just answering 'Y' to fsck's questions will fix things. I
   would appreciate an option where an inconsistency would start up fsck
   in an "automatic" repair mode, with all actions logged and "undo"
   data being saved (in case manual review is needed).

Alternatively it might be worth considering adding a "remote-single-user"
capability:

If an fsck fails, ifconfig the interfaces and start an sshd so people
can get in remotely and fsck...

--
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Boot process robustness

2000-12-28 Thread Peter Pentchev

On Thu, Dec 28, 2000 at 03:31:55PM +0100, Poul-Henning Kamp wrote:
 In message [EMAIL PROTECTED], "Walter W. Ho
 p" writes:
[snip]
 
 2. Automatic file system checks
In case of a powercycle or crash, it could be that a filesystem needs
fixing. Now I don't know much about fs internals, but I guess that in
most cases just answering 'Y' to fsck's questions will fix things. I
would appreciate an option where an inconsistency would start up fsck
in an "automatic" repair mode, with all actions logged and "undo"
data being saved (in case manual review is needed).
 
 Alternatively it might be worth considering adding a "remote-single-user"
 capability:
 
 If an fsck fails, ifconfig the interfaces and start an sshd so people
 can get in remotely and fsck...

What if an fsck on /usr fails?  Other than that, I love the idea!

G'luck,
Peter

-- 
I am not the subject of this sentence.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Boot process robustness

2000-12-28 Thread Dan Nelson

In the last episode (Dec 28), Peter Pentchev said:
 On Thu, Dec 28, 2000 at 03:31:55PM +0100, Poul-Henning Kamp wrote:
  In message [EMAIL PROTECTED], "Walter W. Hop" 
writes:
   2. Automatic file system checks
   In case of a powercycle or crash, it could be that a filesystem
   needs fixing. Now I don't know much about fs internals, but I
   guess that in most cases just answering 'Y' to fsck's questions
   will fix things. I would appreciate an option where an
   inconsistency would start up fsck in an "automatic" repair mode,
   with all actions logged and "undo" data being saved (in case
   manual review is needed).
  
  Alternatively it might be worth considering adding a
  "remote-single-user" capability:
  
  If an fsck fails, ifconfig the interfaces and start an sshd so
  people can get in remotely and fsck...
 
 What if an fsck on /usr fails?  Other than that, I love the idea!

Force-mount it read-only if necessary, or simply copy a static sshd
into /sbin.  Runnning fsck -y is the wrong solution, since if fsck
can't fix an error automatically, something pretty bad has happened
(physical media error, someone dd'ing onto the raw disk, etc)..

-- 
Dan Nelson
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message