Hhhhmmmmm, oh I get it, you have zero knowledge of the platform this list 
represents. No worries, appreciate your time clearing that up.


--
Jason Belec
Sent from my iPad

> On Apr 12, 2014, at 6:26 AM, Bayard Bell <buffer.g.overf...@gmail.com> wrote:
> 
> Jason,
> 
> If you think I've said anything about the sky falling or referenced a wiki, 
> you're responding to something other than what I wrote. I see no need for 
> further reply.
> 
> Cheers,
> Bayard
> 
> 
>> On 11 April 2014 22:36, Jason Belec <jasonbe...@belecmartin.com> wrote:
>> Excellent. If you feel this is necessary go for it. Those that have systems 
>> that don't have ECC should just run like the sky is falling by your point 
>> view. That said, I can guarantee non of the systems I have under my care 
>> have issues. How do I know? Well the data is tested/compared at regular 
>> intervals. Maybe I'm the luckiest guy ever, where is that lottery ticket. Is 
>> ECC better, possibly, probably in heavy load environments, no data has been 
>> provided to back this up. Especially nothing in the context of what most 
>> users needs are at least here in the Mac space. Which ECC? Be specific. They 
>> are not all the same. Just like regular RAM are not all the same. Just like 
>> HDDs are not all the same. Fear mongering is wonderful and easy. Putting 
>> forth a solution guaranteed to be better is what's needed now. Did you 
>> actually reference a wiki? Seriously? A document anyone can edit to suit 
>> there view? I guess I come from a different era. 
>> 
>> 
>> Jason
>> Sent from my iPhone 5S
>> 
>>> On Apr 11, 2014, at 5:09 PM, Bayard Bell <buffer.g.overf...@gmail.com> 
>>> wrote:
>>> 
>>> If you want more of a smoking gun report on data corruption without ECC, 
>>> try:
>>> 
>>> https://blogs.oracle.com/vlad/entry/zfs_likes_to_have_ecc
>>> 
>>> This view isn't isolated in terms of what people at Sun thought or what 
>>> people at Oracle now think. Trying googling for "zfs ecc 
>>> site:blogs.oracle.com", and you'll find a recurring statement that ECC 
>>> should be used even in home deployment, with maybe one odd exception.
>>> 
>>> The Wikipedia article, correctly summarising the Google study, is plain in 
>>> saying not that extremely high error rates are common but that error rates 
>>> are highly variable in large-sample studies, with some systems seeing 
>>> extremely high error rates. ECC gives a significant assurance based on an 
>>> incremental cost, so what's your data worth? You're not guaranteed to be 
>>> screwed by not using ECC (and the Google paper doesn't say this either), 
>>> but you are assuming risks that ECC mitigates. Look at the above blog, 
>>> however: even DIMMs that are high-quality but non-ECC can go wrong and 
>>> result in nasty system corruption.
>>> 
>>> What generally protects you in terms of pool integrity is metadata 
>>> redundancy on top of integrity checks, but if you flip bits on metadata 
>>> in-core before writing redundant copies, well, that's a risk to pool 
>>> integrity.
>>>  
>>> I also think it's mistaken to say this is distinctly a problem with ZFS. 
>>> Any "next-generation" filesystem that provides protections against on-disk 
>>> corruption via checksums ends up with a residual risk focus on making sure 
>>> that in-core data integrity is robust. You could well have those problems 
>>> on the pools you've deployed, and there are a lot of situations in you'd 
>>> never know and quite a lot (such as most of the bits in a photo or MP3) 
>>> where you'd never notice low rates of bit-flipping. The fact that you 
>>> haven't noticed doesn't equate to there being no problems in a strict 
>>> sense, it's far more likely that you've been able to tolerate the flipping 
>>> that's happened. The guy at Sun with the blog above got lucky: he was 
>>> running high-quality non-ECC RAM, and it went pear-shaped, at least for 
>>> metadata cancer, quite quickly, allowing him to recover by rolling back 
>>> snapshots.
>>> 
>>> Take a look out there, and you'll find people who are very confused about 
>>> the risks and available mitigations. I found someone saying that there's no 
>>> problem with more traditional RAID technologies because disks have CRCs. By 
>>> comparison, you can find Bonwick, educated as a statistician, talking about 
>>> SHA256 collisions by comparison to undetected ECC error rates and 
>>> introducing ZFS data integrity safeguards by way of analogy to ECC. That's 
>>> why the large-sample studies are interesting and useful: none of this 
>>> technology makes data corruption impossible, it just goes to extreme length 
>>> to marginalise the chances of those events by addressing known sources of 
>>> errors and fundamental error scenarios--in-core is so core that if you 
>>> tolerate error there, those errors will characterize systematic behaviour 
>>> where you have better outcomes reasonably available (and that's 
>>> **reasonably** available, I would suggest, in a way that the Madison 
>>> paper's recommendation to make ZFS buffers magical isn't). CRC-32 does a 
>>> great job detecting bad sectors and preventing them from being read back, 
>>> but SHA256 in the right place in a system detects errors that a 
>>> well-conceived vdev topology will generally make recoverable. That includes 
>>> catching cases where an error isn't caught by CRC-32, which may be a rare 
>>> result, but when you've got the kind of data densities that ZFS can allow, 
>>> you're rolling the dice often enough that those results become interesting.
>>> 
>>> ECC is one of the most basic steps to take, and if you look at the 
>>> architectural literature, that's how it's treated. If you really want to be 
>>> in on the joke, find the opensolaris zfs list thread from 2009 where 
>>> someone asks about ECC, and someone else jumps in to remark on how 
>>> VirtualBox can be poison for pool integrity for reasons rehearsed in my 
>>> last post.
>>> 
>>> Cheers,
>>> Bayard
>>> 
>>>> On 1 April 2014 12:04, Jason Belec <jasonbe...@belecmartin.com> wrote:
>>>> ZFS is lots of parts, in most cases lots of cheap unreliable parts, 
>>>> refurbished parts, yadda yadda, as posted on this thread and many, many 
>>>> others, any issues are probably not ZFS but the parts of the whole. Yes, 
>>>> it could be ZFS, after you confirm that all the parts ate pristine, maybe. 
>>>> 
>>>> My oldest system running ZFS is an Mac Mini Intel Core Duo with 3GB RAM 
>>>> (not ECC) it is the home server for music, tv shows, movies, and some 
>>>> interim backups. The mini has been modded for ESATA and has 6 drives 
>>>> connected. The pool is 2 RaidZ of 3 mirrored with copies set at 2. Been 
>>>> running since ZFS was released from Apple builds. Lost 3 drives, 
>>>> eventually traced to a new cable that cracked at the connector which when 
>>>> hot enough expanded lifting 2 pins free of their connector counter parts 
>>>> resulting in errors. Visually almost impossible to see. I replaced port 
>>>> multipliers, Esata cards, RAM, mini's, power supply, reinstalled OS, 
>>>> reinstalled ZFS, restored ZFS data from backup, finally to find the bad 
>>>> connector end one because it was hot and felt 'funny'. 
>>>> 
>>>> Frustrating, yes, educational also. The happy news is, all the data was 
>>>> fine, wife would have torn me to shreds if photos were missing, music was 
>>>> corrupt, etc., etc.. And this was on the old out of date but stable ZFS 
>>>> version we Mac users have been hugging onto for dear life. YMMV
>>>> 
>>>> Never had RAM as the issue, here in the mad science lab across 10 rotating 
>>>> systems or in any client location - pick your decade. However I don't use 
>>>> cheap RAM either, and I only have 2 Systems requiring ECC currently that 
>>>> don't even connect to ZFS as they are both XServers with other lives.
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Jason Belec
>>>> Sent from my iPad
>>>> 
>>>>> On Apr 1, 2014, at 12:13 AM, Daniel Becker <razzf...@gmail.com> wrote:
>>>>> 
>>>>>> On Mar 31, 2014, at 7:41 PM, Eric Jaw <naisa...@gmail.com> wrote:
>>>>>> 
>>>>>> I started using ZFS about a few weeks ago, so a lot of it is still new 
>>>>>> to me. I'm actually not completely certain about "proper procedure" for 
>>>>>> repairing a pool. I'm not sure if I'm supposed to clear the errors after 
>>>>>> the scrub, before or after (little things). I'm not sure if it even 
>>>>>> matters. When I restarted the VM, the checksum counts cleared on its own.
>>>>> 
>>>>> The counts are not maintained across reboots.
>>>>> 
>>>>> 
>>>>>> On the first scrub it repaired roughly 1.65MB. None on the second scub. 
>>>>>> Even after the scrub there were still 43 data errors. I was expecting 
>>>>>> they were going to go away.
>>>>>> 
>>>>>>> errors: 43 data errors, use '-v' for a list
>>>>> 
>>>>> What this means is that in these 43 cases, the system was not able to 
>>>>> correct the error (i.e., both drives in a mirror returned bad data).
>>>>> 
>>>>> 
>>>>>> This is an excellent question. They're in 'Normal' mode. I remember 
>>>>>> looking in to this before and decided normal mode should be fine. I 
>>>>>> might be wrong. So thanks for bringing this up. I'll have to check it 
>>>>>> out again.
>>>>> 
>>>>> The reason I was asking is that these symptoms would also be consistent 
>>>>> with something outside the VM writing to the disks behind the VM’s back; 
>>>>> that’s unlikely to happen accidentally with disk images, but raw disks 
>>>>> are visible to the host OS as such, so it may be as simple as Windows 
>>>>> deciding that it should initialize the “unformatted” (really, formatted 
>>>>> with an unknown filesystem) devices. Or it could be a raid controller 
>>>>> that stores its array metadata in the last sector of the array’s disks.
>>>>> 
>>>>> 
>>>>>> memtest86 and memtest86+ for 18 hours came out okay. I'm on my third 
>>>>>> scrub and the number or errors has remained at 43. Checksum errors 
>>>>>> continue to pile up as the pool is getting scrubbed.
>>>>>> 
>>>>>> I'm just as flustered about this. Thanks again for the input.
>>>>> 
>>>>> Given that you’re seeing a fairly large number of errors in your scrubs, 
>>>>> the fact that memtest86 doesn’t find anything at all very strongly 
>>>>> suggests that this is not actually a memory issue.
>>>> 
>>>> -- 
>>>> 
>>>> --- 
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "zfs-macos" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>> email to zfs-macos+unsubscr...@googlegroups.com.
>>>> For more options, visit https://groups.google.com/d/optout.
>>> 
>>> -- 
>>> 
>>> --- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "zfs-macos" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to zfs-macos+unsubscr...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>> 
>> -- 
>> 
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "zfs-macos" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to zfs-macos+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> 
> --- 
> You received this message because you are subscribed to the Google Groups 
> "zfs-macos" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to zfs-macos+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to