I was able to destroy ZFS pools by trying to access them from inside VirtualBox. Until I read the detailed documentation, and set the disk buffer options correctly. I will dig into my notes and post the key setting to this thread when I find it.
But I've used ZFS for many years without ECC RAM with no trouble. It isn't the best way to,go, but it isn't the lack of ECC that's killing a ZFS pool. It's the hypervisor hardware emulation and buffering. Sent from my iPad > On Apr 1, 2014, at 5:24 PM, Jason Belec <jasonbe...@belecmartin.com> wrote: > > I think Bayard has hit on some very interesting points, part of what I was > alluding to, but very well presented here. > > Jason > Sent from my iPhone 5S > >> On Apr 1, 2014, at 7:14 PM, Bayard Bell <buffer.g.overf...@gmail.com> wrote: >> >> Could you explain how you're using VirtualBox and why you'd use a type 2 >> hypervisor in this context? >> >> Here's a scenario where you really have to mind with hypervisors: ZFS tells >> a virtualised controller that it needs to sync a buffer, and the controller >> tells ZFS that all's well while perhaps requesting an async flush. ZFS >> thinks it's done all the I/Os to roll a TXG to stable storage, but in the >> mean time something else crashes and whoosh go your buffers. >> >> I'm not sure it's come across particularly well in this thread, but ZFS >> doesn't and can't cope with hardware that's so unreliable that it tells lies >> about basic things, like whether your writes have made it to stable storage, >> or doesn't mind the shop, as is the case with non-ECC memory. It's one thing >> when you have a device reading back something that doesn't match the >> checksum, but it gets uglier when you've got a single I/O path and a >> controller that seems to write the wrong bits in stride (I've seen this) or >> when the problems are even closer to home (and again I emphasise RAM). You >> may not have problems right away. You may have problems where you can't tell >> the difference, like flipping bits in data buffers that have no other >> integrity checks. But you can run into complex failure scenarios where ZFS >> has to cash in on guarantees that were rather more approximate than what it >> was told, and then it may not be a case of having some bits flipped in >> photos or MP3s but no longer being able to import your pool or having >> someone who knows how to operate zdb do some additional TXG rollback to get >> your data back after losing some updates. >> >> I don't know if you're running ZFS in a VM or running VMs on top of ZFS, but >> either way, you probably want to Google for "data loss" "VirtualBox" and >> whatever device you're emulating and see whether there are known issues. You >> can find issue reports out there on VirtualBox data loss, but working >> through bug reports can be challenging. >> >> Cheers, >> Bayard >> >>> On 1 April 2014 16:34, Eric Jaw <naisa...@gmail.com> wrote: >>> >>> >>>> On Tuesday, April 1, 2014 7:04:39 AM UTC-4, jasonbelec wrote: >>>> ZFS is lots of parts, in most cases lots of cheap unreliable parts, >>>> refurbished parts, yadda yadda, as posted on this thread and many, many >>>> others, any issues are probably not ZFS but the parts of the whole. Yes, >>>> it could be ZFS, after you confirm that all the parts ate pristine, maybe. >>> >>> >>> I don't think it's ZFS. ZFS is pretty solid. In my specific case, I'm >>> trying to figure out why VirtualBox is creating these issues. I'm pretty >>> sure that's the root cause, but I don't know why yet. So I'm just >>> speculating at this point. Of course, I want to get my ZFS up and running >>> so I can move on to what I really need to do, so it's easy to jump on a >>> conclusion about something that I haven't thought of in my position. Hope >>> you can understand >>> >>>> >>>> My oldest system running ZFS is an Mac Mini Intel Core Duo with 3GB RAM >>>> (not ECC) it is the home server for music, tv shows, movies, and some >>>> interim backups. The mini has been modded for ESATA and has 6 drives >>>> connected. The pool is 2 RaidZ of 3 mirrored with copies set at 2. Been >>>> running since ZFS was released from Apple builds. Lost 3 drives, >>>> eventually traced to a new cable that cracked at the connector which when >>>> hot enough expanded lifting 2 pins free of their connector counter parts >>>> resulting in errors. Visually almost impossible to see. I replaced port >>>> multipliers, Esata cards, RAM, mini's, power supply, reinstalled OS, >>>> reinstalled ZFS, restored ZFS data from backup, finally to find the bad >>>> connector end one because it was hot and felt 'funny'. >>>> >>>> Frustrating, yes, educational also. The happy news is, all the data was >>>> fine, wife would have torn me to shreds if photos were missing, music was >>>> corrupt, etc., etc.. And this was on the old out of date but stable ZFS >>>> version we Mac users have been hugging onto for dear life. YMMV >>>> >>>> Never had RAM as the issue, here in the mad science lab across 10 rotating >>>> systems or in any client location - pick your decade. However I don't use >>>> cheap RAM either, and I only have 2 Systems requiring ECC currently that >>>> don't even connect to ZFS as they are both XServers with other lives. >>>> >>>> >>>> -- >>>> Jason Belec >>>> Sent from my iPad >>>> >>>>> On Apr 1, 2014, at 12:13 AM, Daniel Becker <razz...@gmail.com> wrote: >>>>> >>>> >>>>>> On Mar 31, 2014, at 7:41 PM, Eric Jaw <nais...@gmail.com> wrote: >>>>>> >>>>>> I started using ZFS about a few weeks ago, so a lot of it is still new >>>>>> to me. I'm actually not completely certain about "proper procedure" for >>>>>> repairing a pool. I'm not sure if I'm supposed to clear the errors after >>>>>> the scrub, before or after (little things). I'm not sure if it even >>>>>> matters. When I restarted the VM, the checksum counts cleared on its own. >>>>> >>>>> The counts are not maintained across reboots. >>>>> >>>>> >>>>>> On the first scrub it repaired roughly 1.65MB. None on the second scub. >>>>>> Even after the scrub there were still 43 data errors. I was expecting >>>>>> they were going to go away. >>>>>> >>>>>>> errors: 43 data errors, use '-v' for a list >>>>> >>>>> What this means is that in these 43 cases, the system was not able to >>>>> correct the error (i.e., both drives in a mirror returned bad data). >>>>> >>>>> >>>>>> This is an excellent question. They're in 'Normal' mode. I remember >>>>>> looking in to this before and decided normal mode should be fine. I >>>>>> might be wrong. So thanks for bringing this up. I'll have to check it >>>>>> out again. >>>>> >>>>> The reason I was asking is that these symptoms would also be consistent >>>>> with something outside the VM writing to the disks behind the VM’s back; >>>>> that’s unlikely to happen accidentally with disk images, but raw disks >>>>> are visible to the host OS as such, so it may be as simple as Windows >>>>> deciding that it should initialize the “unformatted” (really, formatted >>>>> with an unknown filesystem) devices. Or it could be a raid controller >>>>> that stores its array metadata in the last sector of the array’s disks. >>>>> >>>>> >>>>>> memtest86 and memtest86+ for 18 hours came out okay. I'm on my third >>>>>> scrub and the number or errors has remained at 43. Checksum errors >>>>>> continue to pile up as the pool is getting scrubbed. >>>>>> >>>>>> I'm just as flustered about this. Thanks again for the input. >>>>> >>>>> Given that you’re seeing a fairly large number of errors in your scrubs, >>>>> the fact that memtest86 doesn’t find anything at all very strongly >>>>> suggests that this is not actually a memory issue. >>> >>> -- >>> >>> --- >>> You received this message because you are subscribed to the Google Groups >>> "zfs-macos" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to zfs-macos+unsubscr...@googlegroups.com. >>> For more options, visit https://groups.google.com/d/optout. >> >> -- >> >> --- >> You received this message because you are subscribed to the Google Groups >> "zfs-macos" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to zfs-macos+unsubscr...@googlegroups.com. >> For more options, visit https://groups.google.com/d/optout. > -- > > --- > You received this message because you are subscribed to the Google Groups > "zfs-macos" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to zfs-macos+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- --- You received this message because you are subscribed to the Google Groups "zfs-macos" group. To unsubscribe from this group and stop receiving emails from it, send an email to zfs-macos+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.