On 04/24/2013 11:03 AM, Joseph Areeda wrote:
Thanks for the tips Konstantin,

I assume that your recommendation for 24 hrs of memtest is cumulative
and I can probably see the same results starting it each night when I
quit for the day.

When I mentioned SMART I was talking about the self tests not the status
that comes up.  I've also copied large files around and checked their
md5sum's.

I played with LiveCD for 4 or 5 hours today, much of it was trying to
install it on a different spinning hard drive.

I did see one time when the SSD was shown in the disk utility but all
the partitions were zero length.  that's where my root directory used to be.

I recently discovered that a flaky disk can really mess a system up. I had an old CentOS5 machine that I recently reinstalled as SL6 because it was hanging frequently and eventually, after a reboot from a frozen state, had so many fsck errors that it would not boot.

Since upgrading to SL the hangs continued. Nothing in the logs, and whenever I went to the machine after it hung it just had a sleeping monitor but was otherwise entirely unresponsive.

Ran memtest for 24+ hours, no errors. But recently it threw these errors on the console while the monitor was _not_ asleep:

kernel: ata4: exception Emask 0x10 SAct 0x0 SErr 0x90200 action 0xe frozen
kernel: ata4: irq_stat 0x00400000, PHY RDY changed
kernel: ata4: SError: { Persist PHYRdyChg 10B8B }
kernel: ata4: hard resetting link

Swapped out the drive and now everything runs smoothy.

When running pvmove with the disk installed in another machine I found a number of similar errors in that machine's logs but because the disk was not the root/swap partition drive on that machine it could reset the link and continue moving data.

Jeff

Reply via email to