-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Adam Nielsen wrote:
|>Then it locks.  Once it printed the message about INIT starting after
|>this, so I guess it's something around this point that's causing the
|>problem.  However since it seems to be a *really* bad lock, I don't
|>know how to proceed :-(
|
|
| Ok, I've done some more fiddling and I think I've narrowed the problem
| down to the SATA driver.  Since the kernel used to boot, then it
| started locking up during booting, I was trying to work out what could
| cause this.  I tried wiggling the SATA cable again, and that fixed
| it - I could boot Linux again.
|
| Wiggling the SATA cable while Linux was loaded didn't seem to cause a
| problem, so I started dd reading off a ton of data and started fiddling
| with the cable.  I couldn't seem to fault anything, so I tried
| unplugging the cable altogether (as SATA is hot swappable.) Obviously
| everything accessing the drive paused, and when I plugged the cable back
| into the drive it seemed as though everything resumed again (because the
| drive started reading off a ton of data due to dd running.)  However,
| after a few seconds the drive stopped reading, and dd wouldn't
| terminate.  I tried switching to another VT and logging in, but after I
| typed in my login name it froze too.  It seems that if the drive gets
| disconnected and reconnected, everything accessing it stops (I'm
| guessing due to a driver problem.)
|
| I'm still not sure why it would lock up during boot, but perhaps when
| the SATA hardware is initialised the drive disconnects and reconnects,
| and if it hasn't reconnected by the time something (e.g. ReiserFS) wants
| to access the disk, that causes the lockup - only since the lock happens
| in the kernel, it completely locks everything up.  Of course, this is
| just a wild guess, but given the behaviour it does seem like a
| possibility.  Especially if hotswapping is a feature not yet implemented
| in the driver (which is quite likely, as it is still only in the testing
| stage.)
|
| Anyway, thanks for all your help, and I might try to find the SI3512
| driver people and see what they think.  Wish me luck ;-)

It may be a driver problem, but it could also be a shortcoming in the
way ReiserFS currently handles write errors in the journal.

Even when all you're doing is reading from the disk, you're still
writing a little bit (unless you've mounted with -onoatime,nodiratime).
ReiserFS can't deal at all with the device going away when it's
performing journal operations. Since updating the atime for a file
requires altering the metadata, it uses the journal. If I had to guess,
I'd say that you'd see ReiserFS panic in your logs, with something along
the lines of "journal-###: buffer write failed". A panic will make all
further access to the filesystem hang.

I'm in the final stages of a patch that will allow ReiserFS to handle
journal io-errors more gracefully. The result will be, rather than
panicking the system on journal write, the filesystem will be forced
read-only and all active transactions will be aborted and released. The
filesystem will umount'able, and on re-mount will appear similar to as
if a power failure had occured. However, since it did abort on an
io-error, I'd recommend a reiserfsck on the aborted partition.

- -Jeff

- --
Jeff Mahoney
SuSE Labs
[EMAIL PROTECTED]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFAZFBgLPWxlyuTD7IRAgOYAJ4m9a2QRsUH7BuB7igHOWZf3P3j4ACfWml/
0aYNPCreOG3UQbI4/YNJTSw=
=Bqo5
-----END PGP SIGNATURE-----

Reply via email to