On 17-Jul-06, at 2:14 PM, Brad Dameron wrote:

On Mon, 2006-07-17 at 21:55 +0400, Vladimir V. Saveliev wrote:
Hello

On Mon, 2006-07-17 at 10:53 +0200, Francisco Javier Cabello wrote:
Hello Vladimir,
such corruptions used to be considered as hardware bugs. Memory failure,
for instance. Did you ever run memtest on your systems?

Yes, We have run memtest in our system. It's very seldom to find a system with a hardware memory problem running. When we find a memory problem the kernel doesn't boot. I am going to pass memtest in some of the system with reiserfs
corruption problem.


This is not true. There are certain memory issues that can still allow
the system to boot and appear to run ok. I had a system that didn't show a memory error until the 4th pass on memtest. I just happened to let it
run over the weekend. I have seen other issues with my larger systems
that have 64GB of ram. To where memtest after a week didn't detect
anything but the kernel mcelog reported weird ECC memory issues. I
replaced several DIMM's and the issue went away. But who knows what
could of occured had I not replaced the memory.

I agree with Brad. Memory problems can certainly manifest in obvious or obscure ways that don't prevent boot. I spent months chasing down what I thought was an IDE controller chipset problem (corrupt disk I/ O invisible to the kernel, hence corrupt filesystems, etc) that was simply bad RAM.

--T


Brad Dameron
SeaTab Software
www.seatab.com


Reply via email to