On Mon, 2006-07-17 at 21:55 +0400, Vladimir V. Saveliev wrote: > Hello > > On Mon, 2006-07-17 at 10:53 +0200, Francisco Javier Cabello wrote: > > Hello Vladimir, > > > such corruptions used to be considered as hardware bugs. Memory failure, > > > for instance. Did you ever run memtest on your systems? > > > > Yes, We have run memtest in our system. It's very seldom to find a system > > with > > a hardware memory problem running. When we find a memory problem the kernel > > doesn't boot. I am going to pass memtest in some of the system with > > reiserfs > > corruption problem. > >
This is not true. There are certain memory issues that can still allow the system to boot and appear to run ok. I had a system that didn't show a memory error until the 4th pass on memtest. I just happened to let it run over the weekend. I have seen other issues with my larger systems that have 64GB of ram. To where memtest after a week didn't detect anything but the kernel mcelog reported weird ECC memory issues. I replaced several DIMM's and the issue went away. But who knows what could of occured had I not replaced the memory. Brad Dameron SeaTab Software www.seatab.com
