On 28-Mar-06, at 10:34 AM, Joachim Feise wrote:

Toby Thain wrote on 03/27/06 22:34:

On 27-Mar-06, at 11:39 PM, [EMAIL PROTECTED] wrote:

On Mon, 27 Mar 2006 14:32:14 PST, Joe Feise said:

Thanks for the suggestion. I haven't run a memtest, but I don't
really think
that the memory is bad. The machine most likely would have had
other issues
if that was the case.
You'd be *amazed*.  Intermittently weak memory (especially if it's
just one bad
bit) can manifest in the most odd ways.


I tend to agree. I spent weeks/months chasing down what I thought was
a chipset bug, when it was bad RAM. Disk reads (and probably writes)
were being corrupted and the kernel did not know about it. Was very
frustrating ... until I figured out the real problem.

Joe, did you soak the test at least overnight? Have you done any
heavy compiles (like building X11, or gcc) lately? Compilers are
often the canaries in the mine, when it comes to RAM. I'm not saying
this is your problem but it would be good to rule out first.


This is a production machine that I can't take offline for too long.
But yes, I have compiled the kernel on another reiser4 partition over night,
without problems.
If this was a memory problem, it would indeed manifest itself in other areas with more or less random errors. The fact that it does not indicates to me that this is a fs problem. So, at this point I am ruling out a memory issue.


I agree with all of the above, except the "random errors" part. Depending on the type of fault, it may manifest only under very specific but repeatable conditions, and never be seen in ordinary workload. Other faults, of course, will manifest in many contexts and apparently randomly. But I guess this is OT by now. :-)

--Toby


-Joe

Reply via email to