I think Bayard has hit on some very interesting points, part of what I was
alluding to, but very well presented here.
Sent from my iPhone 5S
> On Apr 1, 2014, at 7:14 PM, Bayard Bell <buffer.g.overf...@gmail.com> wrote:
> Could you explain how you're using VirtualBox and why you'd use a type 2
> hypervisor in this context?
> Here's a scenario where you really have to mind with hypervisors: ZFS tells a
> virtualised controller that it needs to sync a buffer, and the controller
> tells ZFS that all's well while perhaps requesting an async flush. ZFS thinks
> it's done all the I/Os to roll a TXG to stable storage, but in the mean time
> something else crashes and whoosh go your buffers.
> I'm not sure it's come across particularly well in this thread, but ZFS
> doesn't and can't cope with hardware that's so unreliable that it tells lies
> about basic things, like whether your writes have made it to stable storage,
> or doesn't mind the shop, as is the case with non-ECC memory. It's one thing
> when you have a device reading back something that doesn't match the
> checksum, but it gets uglier when you've got a single I/O path and a
> controller that seems to write the wrong bits in stride (I've seen this) or
> when the problems are even closer to home (and again I emphasise RAM). You
> may not have problems right away. You may have problems where you can't tell
> the difference, like flipping bits in data buffers that have no other
> integrity checks. But you can run into complex failure scenarios where ZFS
> has to cash in on guarantees that were rather more approximate than what it
> was told, and then it may not be a case of having some bits flipped in photos
> or MP3s but no longer being able to import your pool or having someone who
> knows how to operate zdb do some additional TXG rollback to get your data
> back after losing some updates.
> I don't know if you're running ZFS in a VM or running VMs on top of ZFS, but
> either way, you probably want to Google for "data loss" "VirtualBox" and
> whatever device you're emulating and see whether there are known issues. You
> can find issue reports out there on VirtualBox data loss, but working through
> bug reports can be challenging.
>> On 1 April 2014 16:34, Eric Jaw <naisa...@gmail.com> wrote:
>>> On Tuesday, April 1, 2014 7:04:39 AM UTC-4, jasonbelec wrote:
>>> ZFS is lots of parts, in most cases lots of cheap unreliable parts,
>>> refurbished parts, yadda yadda, as posted on this thread and many, many
>>> others, any issues are probably not ZFS but the parts of the whole. Yes, it
>>> could be ZFS, after you confirm that all the parts ate pristine, maybe.
>> I don't think it's ZFS. ZFS is pretty solid. In my specific case, I'm trying
>> to figure out why VirtualBox is creating these issues. I'm pretty sure
>> that's the root cause, but I don't know why yet. So I'm just speculating at
>> this point. Of course, I want to get my ZFS up and running so I can move on
>> to what I really need to do, so it's easy to jump on a conclusion about
>> something that I haven't thought of in my position. Hope you can understand
>>> My oldest system running ZFS is an Mac Mini Intel Core Duo with 3GB RAM
>>> (not ECC) it is the home server for music, tv shows, movies, and some
>>> interim backups. The mini has been modded for ESATA and has 6 drives
>>> connected. The pool is 2 RaidZ of 3 mirrored with copies set at 2. Been
>>> running since ZFS was released from Apple builds. Lost 3 drives, eventually
>>> traced to a new cable that cracked at the connector which when hot enough
>>> expanded lifting 2 pins free of their connector counter parts resulting in
>>> errors. Visually almost impossible to see. I replaced port multipliers,
>>> Esata cards, RAM, mini's, power supply, reinstalled OS, reinstalled ZFS,
>>> restored ZFS data from backup, finally to find the bad connector end one
>>> because it was hot and felt 'funny'.
>>> Frustrating, yes, educational also. The happy news is, all the data was
>>> fine, wife would have torn me to shreds if photos were missing, music was
>>> corrupt, etc., etc.. And this was on the old out of date but stable ZFS
>>> version we Mac users have been hugging onto for dear life. YMMV
>>> Never had RAM as the issue, here in the mad science lab across 10 rotating
>>> systems or in any client location - pick your decade. However I don't use
>>> cheap RAM either, and I only have 2 Systems requiring ECC currently that
>>> don't even connect to ZFS as they are both XServers with other lives.
>>> Jason Belec
>>> Sent from my iPad
>>>> On Apr 1, 2014, at 12:13 AM, Daniel Becker <razz...@gmail.com> wrote:
>>>>> On Mar 31, 2014, at 7:41 PM, Eric Jaw <nais...@gmail.com> wrote:
>>>>> I started using ZFS about a few weeks ago, so a lot of it is still new to
>>>>> me. I'm actually not completely certain about "proper procedure" for
>>>>> repairing a pool. I'm not sure if I'm supposed to clear the errors after
>>>>> the scrub, before or after (little things). I'm not sure if it even
>>>>> matters. When I restarted the VM, the checksum counts cleared on its own.
>>>> The counts are not maintained across reboots.
>>>>> On the first scrub it repaired roughly 1.65MB. None on the second scub.
>>>>> Even after the scrub there were still 43 data errors. I was expecting
>>>>> they were going to go away.
>>>>>> errors: 43 data errors, use '-v' for a list
>>>> What this means is that in these 43 cases, the system was not able to
>>>> correct the error (i.e., both drives in a mirror returned bad data).
>>>>> This is an excellent question. They're in 'Normal' mode. I remember
>>>>> looking in to this before and decided normal mode should be fine. I might
>>>>> be wrong. So thanks for bringing this up. I'll have to check it out again.
>>>> The reason I was asking is that these symptoms would also be consistent
>>>> with something outside the VM writing to the disks behind the VM’s back;
>>>> that’s unlikely to happen accidentally with disk images, but raw disks are
>>>> visible to the host OS as such, so it may be as simple as Windows deciding
>>>> that it should initialize the “unformatted” (really, formatted with an
>>>> unknown filesystem) devices. Or it could be a raid controller that stores
>>>> its array metadata in the last sector of the array’s disks.
>>>>> memtest86 and memtest86+ for 18 hours came out okay. I'm on my third
>>>>> scrub and the number or errors has remained at 43. Checksum errors
>>>>> continue to pile up as the pool is getting scrubbed.
>>>>> I'm just as flustered about this. Thanks again for the input.
>>>> Given that you’re seeing a fairly large number of errors in your scrubs,
>>>> the fact that memtest86 doesn’t find anything at all very strongly
>>>> suggests that this is not actually a memory issue.
>> You received this message because you are subscribed to the Google Groups
>> "zfs-macos" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to zfs-macos+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
> You received this message because you are subscribed to the Google Groups
> "zfs-macos" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to zfs-macos+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups
To unsubscribe from this group and stop receiving emails from it, send an email
For more options, visit https://groups.google.com/d/optout.