On Sun, Apr 04, 2010 at 11:46:16PM -0700, Willard Korfhage wrote:
> Looks like it was RAM. I ran memtest+ 4.00, and it found no problems.

Then why do you suspect the ram?

Especially with 12 disks, another likely candidate could be an
overloaded power supply.  While there may be problems showing up in
RAM, it may only be happening under the combined load of disks, cpu
and memory activity that brings the system into marginal power
conditions.  Sometimes it may be just one rail that is out of bounds,
and other devices are unaffected.

If memtest didn't find any problems without the disk and cpu load,
that tends to support this hypothesis.

So, the memory may not be "bad" per se, though it's still not ECC and
therefore not "good" either :-)   Perhaps you can still find a good
use for it elsewhere.

> I removed 2 of the 3 sticks of RAM, ran a backup, and had no
> errors. I'm running more extensive tests, but it looks like that was
> it. A new motherboard, CPU and ECC RAM are on the way to me now. 

Switching to ECC is a good thing.. but be prepared for possible
continued issues (with different detection thaks to ecc) if the root
cause is the psu.  In fact, ECC memory may draw marginally more power
and maybe make the problem worse (the new cpu and motherboard could go
either way, depending on your choices). 

--
Dan.

Attachment: pgpCHuyPvOur2.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to