On 17-Jul-06, at 2:14 PM, Brad Dameron wrote:
On Mon, 2006-07-17 at 21:55 +0400, Vladimir V. Saveliev wrote:
Hello
On Mon, 2006-07-17 at 10:53 +0200, Francisco Javier Cabello wrote:
Hello Vladimir,
such corruptions used to be considered as hardware bugs. Memory
failure,
for instance. Did you ever run memtest on your systems?
Yes, We have run memtest in our system. It's very seldom to find
a system with
a hardware memory problem running. When we find a memory problem
the kernel
doesn't boot. I am going to pass memtest in some of the system
with reiserfs
corruption problem.
This is not true. There are certain memory issues that can still allow
the system to boot and appear to run ok. I had a system that didn't
show
a memory error until the 4th pass on memtest. I just happened to
let it
run over the weekend. I have seen other issues with my larger systems
that have 64GB of ram. To where memtest after a week didn't detect
anything but the kernel mcelog reported weird ECC memory issues. I
replaced several DIMM's and the issue went away. But who knows what
could of occured had I not replaced the memory.
I agree with Brad. Memory problems can certainly manifest in obvious
or obscure ways that don't prevent boot. I spent months chasing down
what I thought was an IDE controller chipset problem (corrupt disk I/
O invisible to the kernel, hence corrupt filesystems, etc) that was
simply bad RAM.
--T
Brad Dameron
SeaTab Software
www.seatab.com