On Tuesday, April 1, 2014 12:07:29 AM UTC-4, Gregg wrote: > > The long and the short of it, is that most likely you have a failing > disk or controller/connector more than anything. I used to run an 8-disk, > 4 mirrored pair pool on a small box without good airflow and slow, SATA-150 > controllers that were supported by Solaris 10. I ended up replacing the > whole system with a new large box with 140mm fans as well as sata-300 > controllers to get better cooling. Over time, every disk has failed > because of heat issues. Many of my SATA cables failed too. They were > cheap junk. >
I have my HDD at a steady 40 degrees or below. I thought about replacing the SATA cables, but I have two of them using new ones and the rest using old ones, and from the checksum errors I'm seeing, it would mean all the cables need replacing, which I don't believe could be the case in this build. A failing disk controller on all four drives that were barely used? I have higher confidence in HDD production than that. I feel certain it's something else, but thank you for your input. I'll keep it as a consideration if all else fails. I'm running this all through a VM, which is where I believe could be the issue, but we need to figure out why and how to work around it if this is the case. > > Equipment has to be selected carefully. I do not see any failing bits for > the 3+ years now that I have been running on the new hardware with all of > the disks being replaced 2 years ago, so I have been making no changes for > the past 2 years. All is good for me with ZFS and non-ECC ram. > That's very good to hear. I'm still trying to gather more data, but I'm getting closer to finding an answer. It seems to point somewhere in the memory realm. > > If I build another system, I will build a new system with ECC RAM and will > get new controllers and new cables just because. > > My current select is to use ZFS on Linux, because I haven't had a disk > array/container that I could hook up to the Macs in the house. > > My new ZFS array might end up being Mac Pro based with some of the > thunderbolt based disk carriers. > > I have about 8TB of stuff that I need to be able to keep safe. > > Amazon Glacier is on my radar. At some point I may just get a 4TB USB3.0 > drive to copy stuff to and ship off to Glacier. > > Gregg > > On 3/31/2014 9:41 PM, Eric Jaw wrote: > > > > On Monday, March 31, 2014 5:55:21 PM UTC-4, Daniel Becker wrote: >> >> On Mar 31, 2014, at 2:23 PM, Eric Jaw <nais...@gmail.com> wrote: >> >> Doing a scrub is just obliterating my pool. >> >> >> Is it? I don’t think so: >> > > Thanks for the response! Here's some more details on the setup: > https://forums.virtualbox.org/viewtopic.php?f=6&t=60975 > > I started using ZFS about a few weeks ago, so a lot of it is still new to > me. I'm actually not completely certain about "proper procedure" for > repairing a pool. I'm not sure if I'm supposed to clear the errors after > the scrub, before or after (little things). I'm not sure if it even > matters. When I restarted the VM, the checksum counts cleared on its own. > > I wasn't expecting to run into any issues. But I drew a part of my > conclusion from the high numbers of checksum errors that never happened > until I started reading from the dataset and that number went up in the > tens' when I scrubbed the pool; almost doubling when scrubbed for a second > time. > > >> scan: scrub in progress since Mon Mar 31 10:14:52 2014 >>> 1.83T scanned out of 2.43T at 75.2M/s, 2h17m to go >>> *0 repaired*, 75.55% done >>> >> >> Note the “0 repaired.” >> > > On the first scrub it repaired roughly 1.65MB. None on the second scub. > Even after the scrub there were still 43 data errors. I was expecting they > were going to go away. > > errors: 43 data errors, use '-v' for a list >> > > > > >> >> I'm also running ZFS on FreeBSD 10.0 (RELEASE) in VirtualBox on Windows >> 7 Ultimate. >> >> >> Are the disks that the VM sees file-backed or passed-through raw disks? >> > > This is an excellent question. They're in 'Normal' mode. I remember > looking in to this before and decided normal mode should be fine. I might > be wrong. So thanks for bringing this up. I'll have to check it out again. > > > >> >> Things seem to be pointing to non-ECC RAM causing checksum errors. It >> looks like I'll have to swap out my memory to ECC RAM if I want to continue >> this project, otherwise the data is pretty much hosed right now. >> >> >> Did you actually run a memory tester (e.g., memcheck86), or is this just >> based on gut feeling? Lots of things can manifest as checksum errors. If >> you import the pool read-only, do successive scrubs find errors in >> different files (use “zpool status -v”) every time, or are they always in >> the same files? The former would indeed point to some kind of memory >> corruption issue, while in the latter case it’s much more likely that your >> on-disk data somehow got corrupted. >> > > memtest86 and memtest86+ for 18 hours came out okay. I'm on my third scrub > and the number or errors has remained at 43. Checksum errors continue to > pile up as the pool is getting scrubbed. > > I'm just as flustered about this. Thanks again for the input. > -- > > --- > You received this message because you are subscribed to the Google Groups > "zfs-macos" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to zfs-macos+...@googlegroups.com <javascript:>. > For more options, visit https://groups.google.com/d/optout. > > > -- --- You received this message because you are subscribed to the Google Groups "zfs-macos" group. To unsubscribe from this group and stop receiving emails from it, send an email to zfs-macos+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.