On 28-Mar-06, at 10:34 AM, Joachim Feise wrote:
Toby Thain wrote on 03/27/06 22:34:
On 27-Mar-06, at 11:39 PM, [EMAIL PROTECTED] wrote:
On Mon, 27 Mar 2006 14:32:14 PST, Joe Feise said:
Thanks for the suggestion. I haven't run a memtest, but I don't
really think
that the memory is bad. The machine most likely would have had
other issues
if that was the case.
You'd be *amazed*. Intermittently weak memory (especially if it's
just one bad
bit) can manifest in the most odd ways.
I tend to agree. I spent weeks/months chasing down what I thought was
a chipset bug, when it was bad RAM. Disk reads (and probably writes)
were being corrupted and the kernel did not know about it. Was very
frustrating ... until I figured out the real problem.
Joe, did you soak the test at least overnight? Have you done any
heavy compiles (like building X11, or gcc) lately? Compilers are
often the canaries in the mine, when it comes to RAM. I'm not saying
this is your problem but it would be good to rule out first.
This is a production machine that I can't take offline for too long.
But yes, I have compiled the kernel on another reiser4 partition
over night,
without problems.
If this was a memory problem, it would indeed manifest itself in
other areas
with more or less random errors. The fact that it does not
indicates to me that
this is a fs problem. So, at this point I am ruling out a memory
issue.
I agree with all of the above, except the "random errors" part.
Depending on the type of fault, it may manifest only under very
specific but repeatable conditions, and never be seen in ordinary
workload. Other faults, of course, will manifest in many contexts and
apparently randomly. But I guess this is OT by now. :-)
--Toby
-Joe