Thank you Marian for the IDE to AHCI tip. Doing an export and than an import did the trick. And it was much quicker than having to copy everything off and then back.
The zfs scrub using AHCI went no better than using the IDE mode. It resulted in about the same number of checksum errors. It did seem to be a tiny bit faster, but not by all that much--1hr 20min versus 1hr 30min. I next tried switching out the memory and ran a test using the DDR2 Kingston HyperX memory. This memory has been heavily tested in another machine overclocked and running two instances of Folding @ Home in a virtualized environment and 1 GPU client for over a year with no problem. The memory also passed a 24 hour prime95 torture test in this Folding @ Home machine. The memory is rated for 1066, but I forced it to 800Mhz for the test in my media server. Finally this made a difference! Instead of nearly 10-12 checksum errors, it only came back with one error. So I doubled checked all of the settings and changed several things related to the memory that were marked auto. I also forced the memory to 2.2V, since that is what it needs at 1066Mhz. I have no idea what the default was doing or even if it was really required at 800Mhz. After manually setting the timings and voltage, I re-ran the scrub again. No dice. It still came back with one zfs checksum error. I now believe that the motherboard is bad or it is ridiculously picky about memory. The question is whether it is some controller instability or whether it is some other problem. Reluctantly I blew away my Solaris drive and installed Windows on it. (I don't have a spare drive.) I decided to REALLY test the memory subsystem by running Prime95. I have had very good luck using this as a memory tester during my overclocking foray last year. It easily catches memory problems that escape memtest86. Sure enough within 15 minutes, the first Prime95 self test failed. I stopped it and tried again. And within about the same time period it failed again. So it seems that it is *almost* stable. I have never seen a failure after making it 50 minutes running Prime95. Anyway, I plan on RMA'ing the motherboard and going ahead and buying 8GB of memory from Asus's qualified vendor list for the new motherboard. You can be sure I will be testing this new motherboard and memory with Prime95 before I even think about getting Open Solaris up and going again. I guess the moral of the story is that you REALLY need to extensively test your memory/motherboard/cpu even if you don't plan on over clocking anything. This mistake has cost me a lot of time and aggravation. Honestly, coupled with the other problems I have had with this motherboard (one of them a misunderstanding on my part), this is easily the worst motherboard I have ever owned! Now that I have had quite a number of checksum errors get caught by zfs. I really wish that zfs would give a bit more information on how it automatically fixed the problem. Since this is a memory error, it should have generally read the data and checksum from disk and then found an error on occasion. I hope it tries to re-read the data and checksum from disk to a different memory location. If it passes, it should be reported but indicate that nothing was written back to the disk. This could help someone infer whether it was memory/controller related or something with the disk. -- This message posted from opensolaris.org _______________________________________________ storage-discuss mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/storage-discuss
