subject:"Re\: \[zfs\-macos\] ZFS w\/o ECC RAM \-> Total loss of data"

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2016-03-03 Thread Bu Jin

Chris:

I'm pretty sure that the X86 architecture has had ALU error correction for 
a while now.  I know my AMD X2s had L1 and L2 ECC, and I think the ALU was 
protected (though I wouldn't swear to that).  However, looking at an Intel 
white paper on the Xeon E7 family reliability features it says: "E7 family 
provides internal on-die error protection to protect processor registers 
from transient faults, and enables dynamic processor sparing and migration 
in the case of a failing processor."  In fact the over all architecture 
looks like it robustness was a top priority concern.  If you'd like to read 
the paper you can find it here: 
 
http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-e7-family-ras-server-paper.pdf

On Saturday, April 19, 2014 at 10:15:24 AM UTC-6, Chris wrote:
>
>
> This has been quite the interesting thread.  Way back long ago when I was 
> dong graduate work in microarchitecture (aka processor design) there were 
> folks who wanted to put an x86 processor in a satellite.  x86, especially 
> at the time, was totally NOT qualified for use in space.  The Pentium chip 
> (way back) had this really cool feature, that a single bit flip (e.g. 
> transient fault from alpha particle strike) would deadlock the processor 
> cold.  If the correct bit in the reservation queue got toggled.
>
> So why the little story:  Because people who really care about their 
> computation, for the longest time, didn't use x86 processors.  They used 
> IBM mainframe processors, SPARC chips, etc.  Why?  Because, at least 10 
> years ago, the ALU's in x86 chips had *zero* protection.  So while there 
> may have been memory protection - the results of the ALU were completely 
> unprotected.  PowerRISC, SPARC, PA-RISC, etc. at least all had parity 
> protected ALU's.  Parity can't correct the calculation, but it can detect a 
> single bit fault.
>
> If you really want to protect your data end-to-end, you likely, still need 
> to buy a better class of machine.  It might now be included in x86 class 
> processors, but I can't find anything that says the ALU's are protected.  
> The old addage, "you get what you pay for" still applies.  If you're 
> interested, you can read about Fujitsu's SPARC 64 data protection: 
> http://www.fujitsu.com/global/services/computing/server/sparcenterprise/technology/availability/processor.html.
>   
> And I know this type of technology is in things like PowerRISC chips; IBM's 
> mainframe line has had ECC protected ALU's for a long time, (which I've 
> never spent the time to figure out how they work.)
>
>
>
> On Sun, Apr 13, 2014 at 12:34 AM, Michael Newbery  > wrote:
>
>> On 13/04/2014, at 12:47 pm, Rob Lewis > 
>> wrote:
>>
>> I have no dog in this fight, but I wonder if possibly the late discovery 
>> of the need for ECC was a factor in Apple's abandoning the ZFS project. 
>> Unlikely they'd want to reengineer all their machines for it. 
>>
>>
>> I do not know, and am therefor free to speculate :)
>>
>> However, rumour hath it that Apple considered the patent/licence 
>> situation around ZFS to be problematic. Given the current litigious 
>> landscape, this was not a fight that they were willing to buy into. Note 
>> that the patent problem also threatens btrfs.
>> You might discount the magnitude of the threat, but on a cost/benefit 
>> analysis it looks like they walked away.
>>
>> Likewise, some of the benefits and a lot of the emphasis of ZFS lies in 
>> server space, which is not a market that Apple is playing in to any great 
>> extent. It's not that ZFS doesn't have lots of benefits for client space as 
>> well, but the SUN emphasis was very much on the server side (which Oracle 
>> only emphasises).
>>
>> Now, with the OpenZFS model and in particular the options ("We'll support 
>> a,b and t, but not c or e") it's possible they might revisit it sometime 
>> (why yes, I am an incurable optimist. Why do you ask?) but I suspect they 
>> are more interested in distributed file systems a.k.a. the cloud.
>>
>> --
>>
>> Michael Newbery
>>
>> "I have a soft spot for politicians---it's a bog in the west of Ireland!"
>>
>> Dave Allen
>>
>>
>>
>> -- 
>>
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "zfs-macos" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to zfs-macos+...@googlegroups.com .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-05-02 Thread Eric

@ChrisInacio

THAT'S THE COOLEST THING I LEARNED TODAY :D


On Sat, Apr 19, 2014 at 12:15 PM, Chris Inacio  wrote:

>
> This has been quite the interesting thread.  Way back long ago when I was
> dong graduate work in microarchitecture (aka processor design) there were
> folks who wanted to put an x86 processor in a satellite.  x86, especially
> at the time, was totally NOT qualified for use in space.  The Pentium chip
> (way back) had this really cool feature, that a single bit flip (e.g.
> transient fault from alpha particle strike) would deadlock the processor
> cold.  If the correct bit in the reservation queue got toggled.
>
> So why the little story:  Because people who really care about their
> computation, for the longest time, didn't use x86 processors.  They used
> IBM mainframe processors, SPARC chips, etc.  Why?  Because, at least 10
> years ago, the ALU's in x86 chips had *zero* protection.  So while there
> may have been memory protection - the results of the ALU were completely
> unprotected.  PowerRISC, SPARC, PA-RISC, etc. at least all had parity
> protected ALU's.  Parity can't correct the calculation, but it can detect a
> single bit fault.
>
> If you really want to protect your data end-to-end, you likely, still need
> to buy a better class of machine.  It might now be included in x86 class
> processors, but I can't find anything that says the ALU's are protected.
> The old addage, "you get what you pay for" still applies.  If you're
> interested, you can read about Fujitsu's SPARC 64 data protection:
> http://www.fujitsu.com/global/services/computing/server/sparcenterprise/technology/availability/processor.html.
> And I know this type of technology is in things like PowerRISC chips; IBM's
> mainframe line has had ECC protected ALU's for a long time, (which I've
> never spent the time to figure out how they work.)
>
>
>
> On Sun, Apr 13, 2014 at 12:34 AM, Michael Newbery wrote:
>
>> On 13/04/2014, at 12:47 pm, Rob Lewis  wrote:
>>
>> I have no dog in this fight, but I wonder if possibly the late discovery
>> of the need for ECC was a factor in Apple's abandoning the ZFS project.
>> Unlikely they'd want to reengineer all their machines for it.
>>
>>
>> I do not know, and am therefor free to speculate :)
>>
>> However, rumour hath it that Apple considered the patent/licence
>> situation around ZFS to be problematic. Given the current litigious
>> landscape, this was not a fight that they were willing to buy into. Note
>> that the patent problem also threatens btrfs.
>> You might discount the magnitude of the threat, but on a cost/benefit
>> analysis it looks like they walked away.
>>
>> Likewise, some of the benefits and a lot of the emphasis of ZFS lies in
>> server space, which is not a market that Apple is playing in to any great
>> extent. It's not that ZFS doesn't have lots of benefits for client space as
>> well, but the SUN emphasis was very much on the server side (which Oracle
>> only emphasises).
>>
>> Now, with the OpenZFS model and in particular the options ("We'll support
>> a,b and t, but not c or e") it's possible they might revisit it sometime
>> (why yes, I am an incurable optimist. Why do you ask?) but I suspect they
>> are more interested in distributed file systems a.k.a. the cloud.
>>
>> --
>>
>> Michael Newbery
>>
>> "I have a soft spot for politicians---it's a bog in the west of Ireland!"
>>
>> Dave Allen
>>
>>
>>
>>  --
>>
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "zfs-macos" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to zfs-macos+unsubscr...@googlegroups.com.
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
>
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "zfs-macos" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/zfs-macos/qguq6LCf1QQ/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> zfs-macos+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-05-01 Thread Daniel Jozsef

You know, you remind me of my Computer Architectures lecturer. Considered a 
weird guy university-wide, he had these funny maxims like "a PC is not a 
computer" and "Windows is not an Operating System".

Back then, I kind of saw what he meant, but the funny part is that 
nowadays, it's as if his school of thought is being obsoleted by the 
reality around us. It's kind of valid to say that x86 chips are not 
"proper", but the reality is that 95% of the Internet runs on the bloody 
things. Back twenty years ago, there were things like SPARC servers, 
Silicon Graphics workstations, and all. Now? It's all just PCs. PCs that 
fit in your handbag, PCs that fit under your desk, PCs that fit in a server 
rack. It's still just PCs.

Your credit card transactions are run on PCs. Your bank uses PCs to handle 
accounting. Investment banks use PCs to run billion-dollar IPOs. Facebook 
runs on PCs. Google runs on PCs.

Apparently, PCs ARE good enough for the "powers that be". Regardless 
whether the ALU is protected or not. (Though I think it should be, with all 
the innovation Intel has been doing.)

On Saturday, April 19, 2014 12:15:24 PM UTC-4, Chris wrote:
>
>
> This has been quite the interesting thread.  Way back long ago when I was 
> dong graduate work in microarchitecture (aka processor design) there were 
> folks who wanted to put an x86 processor in a satellite.  x86, especially 
> at the time, was totally NOT qualified for use in space.  The Pentium chip 
> (way back) had this really cool feature, that a single bit flip (e.g. 
> transient fault from alpha particle strike) would deadlock the processor 
> cold.  If the correct bit in the reservation queue got toggled.
>
> So why the little story:  Because people who really care about their 
> computation, for the longest time, didn't use x86 processors.  They used 
> IBM mainframe processors, SPARC chips, etc.  Why?  Because, at least 10 
> years ago, the ALU's in x86 chips had *zero* protection.  So while there 
> may have been memory protection - the results of the ALU were completely 
> unprotected.  PowerRISC, SPARC, PA-RISC, etc. at least all had parity 
> protected ALU's.  Parity can't correct the calculation, but it can detect a 
> single bit fault.
>
> If you really want to protect your data end-to-end, you likely, still need 
> to buy a better class of machine.  It might now be included in x86 class 
> processors, but I can't find anything that says the ALU's are protected.  
> The old addage, "you get what you pay for" still applies.  If you're 
> interested, you can read about Fujitsu's SPARC 64 data protection: 
> http://www.fujitsu.com/global/services/computing/server/sparcenterprise/technology/availability/processor.html.
>   
> And I know this type of technology is in things like PowerRISC chips; IBM's 
> mainframe line has had ECC protected ALU's for a long time, (which I've 
> never spent the time to figure out how they work.)
>
>
>
> On Sun, Apr 13, 2014 at 12:34 AM, Michael Newbery 
> 
> > wrote:
>
>> On 13/04/2014, at 12:47 pm, Rob Lewis > 
>> wrote:
>>
>> I have no dog in this fight, but I wonder if possibly the late discovery 
>> of the need for ECC was a factor in Apple's abandoning the ZFS project. 
>> Unlikely they'd want to reengineer all their machines for it. 
>>
>>
>> I do not know, and am therefor free to speculate :)
>>
>> However, rumour hath it that Apple considered the patent/licence 
>> situation around ZFS to be problematic. Given the current litigious 
>> landscape, this was not a fight that they were willing to buy into. Note 
>> that the patent problem also threatens btrfs.
>> You might discount the magnitude of the threat, but on a cost/benefit 
>> analysis it looks like they walked away.
>>
>> Likewise, some of the benefits and a lot of the emphasis of ZFS lies in 
>> server space, which is not a market that Apple is playing in to any great 
>> extent. It's not that ZFS doesn't have lots of benefits for client space as 
>> well, but the SUN emphasis was very much on the server side (which Oracle 
>> only emphasises).
>>
>> Now, with the OpenZFS model and in particular the options ("We'll support 
>> a,b and t, but not c or e") it's possible they might revisit it sometime 
>> (why yes, I am an incurable optimist. Why do you ask?) but I suspect they 
>> are more interested in distributed file systems a.k.a. the cloud.
>>  
>> --
>>
>> Michael Newbery
>>
>> "I have a soft spot for politicians---it's a bog in the west of Ireland!"
>>
>> Dave Allen
>>
>>  
>>
>>  -- 
>>
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "zfs-macos" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to zfs-macos+...@googlegroups.com .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving ema

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-19 Thread Chris Inacio

This has been quite the interesting thread.  Way back long ago when I was
dong graduate work in microarchitecture (aka processor design) there were
folks who wanted to put an x86 processor in a satellite.  x86, especially
at the time, was totally NOT qualified for use in space.  The Pentium chip
(way back) had this really cool feature, that a single bit flip (e.g.
transient fault from alpha particle strike) would deadlock the processor
cold.  If the correct bit in the reservation queue got toggled.

So why the little story:  Because people who really care about their
computation, for the longest time, didn't use x86 processors.  They used
IBM mainframe processors, SPARC chips, etc.  Why?  Because, at least 10
years ago, the ALU's in x86 chips had *zero* protection.  So while there
may have been memory protection - the results of the ALU were completely
unprotected.  PowerRISC, SPARC, PA-RISC, etc. at least all had parity
protected ALU's.  Parity can't correct the calculation, but it can detect a
single bit fault.

If you really want to protect your data end-to-end, you likely, still need
to buy a better class of machine.  It might now be included in x86 class
processors, but I can't find anything that says the ALU's are protected.
The old addage, "you get what you pay for" still applies.  If you're
interested, you can read about Fujitsu's SPARC 64 data protection:
http://www.fujitsu.com/global/services/computing/server/sparcenterprise/technology/availability/processor.html.
And I know this type of technology is in things like PowerRISC chips; IBM's
mainframe line has had ECC protected ALU's for a long time, (which I've
never spent the time to figure out how they work.)

On Sun, Apr 13, 2014 at 12:34 AM, Michael Newbery  wrote:

> On 13/04/2014, at 12:47 pm, Rob Lewis  wrote:
>
> I have no dog in this fight, but I wonder if possibly the late discovery
> of the need for ECC was a factor in Apple's abandoning the ZFS project.
> Unlikely they'd want to reengineer all their machines for it.
>
>
> I do not know, and am therefor free to speculate :)
>
> However, rumour hath it that Apple considered the patent/licence situation
> around ZFS to be problematic. Given the current litigious landscape, this
> was not a fight that they were willing to buy into. Note that the patent
> problem also threatens btrfs.
> You might discount the magnitude of the threat, but on a cost/benefit
> analysis it looks like they walked away.
>
> Likewise, some of the benefits and a lot of the emphasis of ZFS lies in
> server space, which is not a market that Apple is playing in to any great
> extent. It's not that ZFS doesn't have lots of benefits for client space as
> well, but the SUN emphasis was very much on the server side (which Oracle
> only emphasises).
>
> Now, with the OpenZFS model and in particular the options ("We'll support
> a,b and t, but not c or e") it's possible they might revisit it sometime
> (why yes, I am an incurable optimist. Why do you ask?) but I suspect they
> are more interested in distributed file systems a.k.a. the cloud.
>
> --
>
> Michael Newbery
>
> "I have a soft spot for politicians---it's a bog in the west of Ireland!"
>
> Dave Allen
>
>
>
>  --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "zfs-macos" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to zfs-macos+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-12 Thread Michael Newbery

On 13/04/2014, at 12:47 pm, Rob Lewis  wrote:
> I have no dog in this fight, but I wonder if possibly the late discovery of 
> the need for ECC was a factor in Apple's abandoning the ZFS project. Unlikely 
> they'd want to reengineer all their machines for it. 
> 

I do not know, and am therefor free to speculate :)

However, rumour hath it that Apple considered the patent/licence situation 
around ZFS to be problematic. Given the current litigious landscape, this was 
not a fight that they were willing to buy into. Note that the patent problem 
also threatens btrfs.
You might discount the magnitude of the threat, but on a cost/benefit analysis 
it looks like they walked away.

Likewise, some of the benefits and a lot of the emphasis of ZFS lies in server 
space, which is not a market that Apple is playing in to any great extent. It's 
not that ZFS doesn't have lots of benefits for client space as well, but the 
SUN emphasis was very much on the server side (which Oracle only emphasises).

Now, with the OpenZFS model and in particular the options ("We'll support a,b 
and t, but not c or e") it's possible they might revisit it sometime (why yes, 
I am an incurable optimist. Why do you ask?) but I suspect they are more 
interested in distributed file systems a.k.a. the cloud.
--
Michael Newbery
"I have a soft spot for politicians---it's a bog in the west of Ireland!"
Dave Allen

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-12 Thread Richard Elling

On Apr 12, 2014, at 5:47 PM, Rob Lewis  wrote:

> I have no dog in this fight, but I wonder if possibly the late discovery of 
> the need for ECC was a factor in Apple's abandoning the ZFS project. Unlikely 
> they'd want to reengineer all their machines for it. 

I believe the answer is a resounding NO. If they truly cared about desktop data 
corruption 
they would have punted HSF+.  Desktop is as desktop does and Apple is out of 
the server
business.

FYI, Microsoft requires ECC for Windows server certification.
 -- richard

--

richard.ell...@richardelling.com
+1-760-896-4422









-- 

--- 
You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-12 Thread Peter Lai

It sounds like people are missing the forest for the trees. Some of us
have been successfully RAIDing/deploying storage for years on
everything from IDE vinum to SCSI XFS and beyond without ECC. We use
ZFS today because of its featureset. Data integrity checking through
checksumming is just one of those features which would have mitigated
some issues that other file systems have historically failed to do.
(Otherwise we should all be happy with existing journaling filesystems
on a soft or hard RAID). ECC just adds another layer of mitigation
(and even in a less-implementation-specific way like how ZFS may
'prefer' raw device access instead of whatever storage abstraction the
controller is presenting). Asserting that ECC is "required" has about
the same logic to it (and I would say less logic to it) than asserting
a 3ware controller with raw jbod passthrough is "required".

On Sat, Apr 12, 2014 at 7:46 AM, Bayard Bell
 wrote:
> Jason,
>
> Although I moved from OS X to illumos as a primary platform precisely
> because of ZFS (I ended up posting to the original list about the demise of
> the project because I happened to be doing an install the week Apple people
> the plug), I've spent enough time with OS X, including debugging storage
> interop issues with NexentaStor in significant commercial deployments, that
> it's risible to suggest I have zero knowledge of the platform and even more
> risible to imply that the role of ECC in ZFS architecture is here somehow
> fundamentally a matter of platform variation. I've pointed to a Solaris
> engineer showing core dumps from non-ECC RAM and reporting data corruption
> as a substantiated instance of ECC problems, and I've pointed to references
> to how ECC serves as a point of reference from one of its co-creators. I've
> explained that ECC in ZFS should be understood in terms of the scale it
> allow and the challenges that creates for data integrity protection, and
> I've tried to contrast the economics of ECC to what I take to be a less
> compelling alternative sketched out by the Mdison paper. At the same time
> I've said that ECC use is genereally assumed in ZFS, I've allowed that doing
> so is a question of an incremental cost against the value of your data and
> costs to replace it.
>
> I don't understand why you've decided to invest so much in arguing that ECC
> is so completely marginal a data integrity measure that you can't have a
> reasonable discussion about what gets people to different conclusions and
> feel the need to be overtly dismissive of the professionalism and expertise
> of those who come to fundamentally different conclusions, but clearly
> there's not going to be a dialogue on this. My only interest in posting at
> this point is so that people on this list at least have a clear statement of
> both ends of the argument and can judge for themselves.
>
> Regards,
> Bayard
>
>
> On 12 April 2014 11:44, Jason Belec  wrote:
>>
>> Hhhhm, oh I get it, you have zero knowledge of the platform this list
>> represents. No worries, appreciate your time clearing that up.
>>
>>
>>
>> --
>> Jason Belec
>> Sent from my iPad
>>
>> On Apr 12, 2014, at 6:26 AM, Bayard Bell 
>> wrote:
>>
>> Jason,
>>
>> If you think I've said anything about the sky falling or referenced a
>> wiki, you're responding to something other than what I wrote. I see no need
>> for further reply.
>>
>> Cheers,
>> Bayard
>>
>>
>> On 11 April 2014 22:36, Jason Belec  wrote:
>>>
>>> Excellent. If you feel this is necessary go for it. Those that have
>>> systems that don't have ECC should just run like the sky is falling by your
>>> point view. That said, I can guarantee non of the systems I have under my
>>> care have issues. How do I know? Well the data is tested/compared at regular
>>> intervals. Maybe I'm the luckiest guy ever, where is that lottery ticket. Is
>>> ECC better, possibly, probably in heavy load environments, no data has been
>>> provided to back this up. Especially nothing in the context of what most
>>> users needs are at least here in the Mac space. Which ECC? Be specific. They
>>> are not all the same. Just like regular RAM are not all the same. Just like
>>> HDDs are not all the same. Fear mongering is wonderful and easy. Putting
>>> forth a solution guaranteed to be better is what's needed now. Did you
>>> actually reference a wiki? Seriously? A document anyone can edit to suit
>>> there view? I guess I come from a different era.
>>>
>>>
>>> Jason
>>> Sent from my iPhone 5S
>>>
>>> On Apr 11, 2014, at 5:09 PM, Bayard Bell 
>>> wrote:
>>>
>>> If you want more of a smoking gun report on data corruption without ECC,
>>> try:
>>>
>>> https://blogs.oracle.com/vlad/entry/zfs_likes_to_have_ecc
>>>
>>> This view isn't isolated in terms of what people at Sun thought or what
>>> people at Oracle now think. Trying googling for "zfs ecc
>>> site:blogs.oracle.com", and you'll find a recurring statement that ECC
>>> should be used even in home deployment, with maybe one odd exce

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-12 Thread Bayard Bell

Jason,

Although I moved from OS X to illumos as a primary platform precisely
because of ZFS (I ended up posting to the original list about the demise of
the project because I happened to be doing an install the week Apple people
the plug), I've spent enough time with OS X, including debugging storage
interop issues with NexentaStor in significant commercial deployments, that
it's risible to suggest I have zero knowledge of the platform and even more
risible to imply that the role of ECC in ZFS architecture is here somehow
fundamentally a matter of platform variation. I've pointed to a Solaris
engineer showing core dumps from non-ECC RAM and reporting data corruption
as a substantiated instance of ECC problems, and I've pointed to references
to how ECC serves as a point of reference from one of its co-creators. I've
explained that ECC in ZFS should be understood in terms of the scale it
allow and the challenges that creates for data integrity protection, and
I've tried to contrast the economics of ECC to what I take to be a less
compelling alternative sketched out by the Mdison paper. At the same time
I've said that ECC use is genereally assumed in ZFS, I've allowed that
doing so is a question of an incremental cost against the value of your
data and costs to replace it.

I don't understand why you've decided to invest so much in arguing that ECC
is so completely marginal a data integrity measure that you can't have a
reasonable discussion about what gets people to different conclusions and
feel the need to be overtly dismissive of the professionalism and expertise
of those who come to fundamentally different conclusions, but clearly
there's not going to be a dialogue on this. My only interest in posting at
this point is so that people on this list at least have a clear statement
of both ends of the argument and can judge for themselves.

Regards,
Bayard

On 12 April 2014 11:44, Jason Belec  wrote:

> Hhhhm, oh I get it, you have zero knowledge of the platform this list
> represents. No worries, appreciate your time clearing that up.
>
>
>
> --
> Jason Belec
> Sent from my iPad
>
> On Apr 12, 2014, at 6:26 AM, Bayard Bell 
> wrote:
>
> Jason,
>
> If you think I've said anything about the sky falling or referenced a
> wiki, you're responding to something other than what I wrote. I see no need
> for further reply.
>
> Cheers,
> Bayard
>
>
> On 11 April 2014 22:36, Jason Belec  wrote:
>
>> Excellent. If you feel this is necessary go for it. Those that have
>> systems that don't have ECC should just run like the sky is falling by your
>> point view. That said, I can guarantee non of the systems I have under my
>> care have issues. How do I know? Well the data is tested/compared at
>> regular intervals. Maybe I'm the luckiest guy ever, where is that lottery
>> ticket. Is ECC better, possibly, probably in heavy load environments, no
>> data has been provided to back this up. Especially nothing in the context
>> of what most users needs are at least here in the Mac space. Which ECC? Be
>> specific. They are not all the same. Just like regular RAM are not all the
>> same. Just like HDDs are not all the same. Fear mongering is wonderful and
>> easy. Putting forth a solution guaranteed to be better is what's needed
>> now. Did you actually reference a wiki? Seriously? A document anyone can
>> edit to suit there view? I guess I come from a different era.
>>
>>
>> Jason
>> Sent from my iPhone 5S
>>
>> On Apr 11, 2014, at 5:09 PM, Bayard Bell 
>> wrote:
>>
>> If you want more of a smoking gun report on data corruption without ECC,
>> try:
>>
>> https://blogs.oracle.com/vlad/entry/zfs_likes_to_have_ecc
>>
>> This view isn't isolated in terms of what people at Sun thought or what
>> people at Oracle now think. Trying googling for "zfs ecc site:
>> blogs.oracle.com", and you'll find a recurring statement that ECC should
>> be used even in home deployment, with maybe one odd exception.
>>
>> The Wikipedia article, correctly summarising the Google study, is plain
>> in saying not that extremely high error rates are common but that error
>> rates are highly variable in large-sample studies, with some systems seeing
>> extremely high error rates. ECC gives a significant assurance based on an
>> incremental cost, so what's your data worth? You're not guaranteed to be
>> screwed by not using ECC (and the Google paper doesn't say this either),
>> but you are assuming risks that ECC mitigates. Look at the above blog,
>> however: even DIMMs that are high-quality but non-ECC can go wrong and
>> result in nasty system corruption.
>>
>> What generally protects you in terms of pool integrity is metadata
>> redundancy on top of integrity checks, but if you flip bits on metadata
>> in-core before writing redundant copies, well, that's a risk to pool
>> integrity.
>>
>> I also think it's mistaken to say this is distinctly a problem with ZFS.
>> Any "next-generation" filesystem that provides protections against on-disk
>> corruption

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-12 Thread Jason Belec

Hhhhm, oh I get it, you have zero knowledge of the platform this list 
represents. No worries, appreciate your time clearing that up.


--
Jason Belec
Sent from my iPad

> On Apr 12, 2014, at 6:26 AM, Bayard Bell  wrote:
> 
> Jason,
> 
> If you think I've said anything about the sky falling or referenced a wiki, 
> you're responding to something other than what I wrote. I see no need for 
> further reply.
> 
> Cheers,
> Bayard
> 
> 
>> On 11 April 2014 22:36, Jason Belec  wrote:
>> Excellent. If you feel this is necessary go for it. Those that have systems 
>> that don't have ECC should just run like the sky is falling by your point 
>> view. That said, I can guarantee non of the systems I have under my care 
>> have issues. How do I know? Well the data is tested/compared at regular 
>> intervals. Maybe I'm the luckiest guy ever, where is that lottery ticket. Is 
>> ECC better, possibly, probably in heavy load environments, no data has been 
>> provided to back this up. Especially nothing in the context of what most 
>> users needs are at least here in the Mac space. Which ECC? Be specific. They 
>> are not all the same. Just like regular RAM are not all the same. Just like 
>> HDDs are not all the same. Fear mongering is wonderful and easy. Putting 
>> forth a solution guaranteed to be better is what's needed now. Did you 
>> actually reference a wiki? Seriously? A document anyone can edit to suit 
>> there view? I guess I come from a different era. 
>> 
>> 
>> Jason
>> Sent from my iPhone 5S
>> 
>>> On Apr 11, 2014, at 5:09 PM, Bayard Bell  
>>> wrote:
>>> 
>>> If you want more of a smoking gun report on data corruption without ECC, 
>>> try:
>>> 
>>> https://blogs.oracle.com/vlad/entry/zfs_likes_to_have_ecc
>>> 
>>> This view isn't isolated in terms of what people at Sun thought or what 
>>> people at Oracle now think. Trying googling for "zfs ecc 
>>> site:blogs.oracle.com", and you'll find a recurring statement that ECC 
>>> should be used even in home deployment, with maybe one odd exception.
>>> 
>>> The Wikipedia article, correctly summarising the Google study, is plain in 
>>> saying not that extremely high error rates are common but that error rates 
>>> are highly variable in large-sample studies, with some systems seeing 
>>> extremely high error rates. ECC gives a significant assurance based on an 
>>> incremental cost, so what's your data worth? You're not guaranteed to be 
>>> screwed by not using ECC (and the Google paper doesn't say this either), 
>>> but you are assuming risks that ECC mitigates. Look at the above blog, 
>>> however: even DIMMs that are high-quality but non-ECC can go wrong and 
>>> result in nasty system corruption.
>>> 
>>> What generally protects you in terms of pool integrity is metadata 
>>> redundancy on top of integrity checks, but if you flip bits on metadata 
>>> in-core before writing redundant copies, well, that's a risk to pool 
>>> integrity.
>>>  
>>> I also think it's mistaken to say this is distinctly a problem with ZFS. 
>>> Any "next-generation" filesystem that provides protections against on-disk 
>>> corruption via checksums ends up with a residual risk focus on making sure 
>>> that in-core data integrity is robust. You could well have those problems 
>>> on the pools you've deployed, and there are a lot of situations in you'd 
>>> never know and quite a lot (such as most of the bits in a photo or MP3) 
>>> where you'd never notice low rates of bit-flipping. The fact that you 
>>> haven't noticed doesn't equate to there being no problems in a strict 
>>> sense, it's far more likely that you've been able to tolerate the flipping 
>>> that's happened. The guy at Sun with the blog above got lucky: he was 
>>> running high-quality non-ECC RAM, and it went pear-shaped, at least for 
>>> metadata cancer, quite quickly, allowing him to recover by rolling back 
>>> snapshots.
>>> 
>>> Take a look out there, and you'll find people who are very confused about 
>>> the risks and available mitigations. I found someone saying that there's no 
>>> problem with more traditional RAID technologies because disks have CRCs. By 
>>> comparison, you can find Bonwick, educated as a statistician, talking about 
>>> SHA256 collisions by comparison to undetected ECC error rates and 
>>> introducing ZFS data integrity safeguards by way of analogy to ECC. That's 
>>> why the large-sample studies are interesting and useful: none of this 
>>> technology makes data corruption impossible, it just goes to extreme length 
>>> to marginalise the chances of those events by addressing known sources of 
>>> errors and fundamental error scenarios--in-core is so core that if you 
>>> tolerate error there, those errors will characterize systematic behaviour 
>>> where you have better outcomes reasonably available (and that's 
>>> **reasonably** available, I would suggest, in a way that the Madison 
>>> paper's recommendation to make ZFS buffers magical isn't). CRC-32 does a 
>>> great j

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-12 Thread Bayard Bell

Jason,

If you think I've said anything about the sky falling or referenced a wiki,
you're responding to something other than what I wrote. I see no need for
further reply.

Cheers,
Bayard


On 11 April 2014 22:36, Jason Belec  wrote:

> Excellent. If you feel this is necessary go for it. Those that have
> systems that don't have ECC should just run like the sky is falling by your
> point view. That said, I can guarantee non of the systems I have under my
> care have issues. How do I know? Well the data is tested/compared at
> regular intervals. Maybe I'm the luckiest guy ever, where is that lottery
> ticket. Is ECC better, possibly, probably in heavy load environments, no
> data has been provided to back this up. Especially nothing in the context
> of what most users needs are at least here in the Mac space. Which ECC? Be
> specific. They are not all the same. Just like regular RAM are not all the
> same. Just like HDDs are not all the same. Fear mongering is wonderful and
> easy. Putting forth a solution guaranteed to be better is what's needed
> now. Did you actually reference a wiki? Seriously? A document anyone can
> edit to suit there view? I guess I come from a different era.
>
>
> Jason
> Sent from my iPhone 5S
>
> On Apr 11, 2014, at 5:09 PM, Bayard Bell 
> wrote:
>
> If you want more of a smoking gun report on data corruption without ECC,
> try:
>
> https://blogs.oracle.com/vlad/entry/zfs_likes_to_have_ecc
>
> This view isn't isolated in terms of what people at Sun thought or what
> people at Oracle now think. Trying googling for "zfs ecc site:
> blogs.oracle.com", and you'll find a recurring statement that ECC should
> be used even in home deployment, with maybe one odd exception.
>
> The Wikipedia article, correctly summarising the Google study, is plain in
> saying not that extremely high error rates are common but that error rates
> are highly variable in large-sample studies, with some systems seeing
> extremely high error rates. ECC gives a significant assurance based on an
> incremental cost, so what's your data worth? You're not guaranteed to be
> screwed by not using ECC (and the Google paper doesn't say this either),
> but you are assuming risks that ECC mitigates. Look at the above blog,
> however: even DIMMs that are high-quality but non-ECC can go wrong and
> result in nasty system corruption.
>
> What generally protects you in terms of pool integrity is metadata
> redundancy on top of integrity checks, but if you flip bits on metadata
> in-core before writing redundant copies, well, that's a risk to pool
> integrity.
>
> I also think it's mistaken to say this is distinctly a problem with ZFS.
> Any "next-generation" filesystem that provides protections against on-disk
> corruption via checksums ends up with a residual risk focus on making sure
> that in-core data integrity is robust. You could well have those problems
> on the pools you've deployed, and there are a lot of situations in you'd
> never know and quite a lot (such as most of the bits in a photo or MP3)
> where you'd never notice low rates of bit-flipping. The fact that you
> haven't noticed doesn't equate to there being no problems in a strict
> sense, it's far more likely that you've been able to tolerate the flipping
> that's happened. The guy at Sun with the blog above got lucky: he was
> running high-quality non-ECC RAM, and it went pear-shaped, at least for
> metadata cancer, quite quickly, allowing him to recover by rolling back
> snapshots.
>
> Take a look out there, and you'll find people who are very confused about
> the risks and available mitigations. I found someone saying that there's no
> problem with more traditional RAID technologies because disks have CRCs. By
> comparison, you can find Bonwick, educated as a statistician, talking about
> SHA256 collisions by comparison to undetected ECC error rates and
> introducing ZFS data integrity safeguards by way of analogy to ECC. That's
> why the large-sample studies are interesting and useful: none of this
> technology makes data corruption impossible, it just goes to extreme length
> to marginalise the chances of those events by addressing known sources of
> errors and fundamental error scenarios--in-core is so core that if you
> tolerate error there, those errors will characterize systematic behaviour
> where you have better outcomes reasonably available (and that's
> **reasonably** available, I would suggest, in a way that the Madison
> paper's recommendation to make ZFS buffers magical isn't). CRC-32 does a
> great job detecting bad sectors and preventing them from being read back,
> but SHA256 in the right place in a system detects errors that a
> well-conceived vdev topology will generally make recoverable. That includes
> catching cases where an error isn't caught by CRC-32, which may be a rare
> result, but when you've got the kind of data densities that ZFS can allow,
> you're rolling the dice often enough that those results become interesting.
>
> ECC

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-11 Thread Eric

Interesting point about different kinds of ECC memory. I wonder if the
difference is important enough to consider for a 20x3TB ZFS pool. For the
sake of sakes, I will likely look into getting ECC memory.


On Fri, Apr 11, 2014 at 5:36 PM, Jason Belec wrote:

> Excellent. If you feel this is necessary go for it. Those that have
> systems that don't have ECC should just run like the sky is falling by your
> point view. That said, I can guarantee non of the systems I have under my
> care have issues. How do I know? Well the data is tested/compared at
> regular intervals. Maybe I'm the luckiest guy ever, where is that lottery
> ticket. Is ECC better, possibly, probably in heavy load environments, no
> data has been provided to back this up. Especially nothing in the context
> of what most users needs are at least here in the Mac space. Which ECC? Be
> specific. They are not all the same. Just like regular RAM are not all the
> same. Just like HDDs are not all the same. Fear mongering is wonderful and
> easy. Putting forth a solution guaranteed to be better is what's needed
> now. Did you actually reference a wiki? Seriously? A document anyone can
> edit to suit there view? I guess I come from a different era.
>
>
> Jason
> Sent from my iPhone 5S
>
> On Apr 11, 2014, at 5:09 PM, Bayard Bell 
> wrote:
>
> If you want more of a smoking gun report on data corruption without ECC,
> try:
>
> https://blogs.oracle.com/vlad/entry/zfs_likes_to_have_ecc
>
> This view isn't isolated in terms of what people at Sun thought or what
> people at Oracle now think. Trying googling for "zfs ecc site:
> blogs.oracle.com", and you'll find a recurring statement that ECC should
> be used even in home deployment, with maybe one odd exception.
>
> The Wikipedia article, correctly summarising the Google study, is plain in
> saying not that extremely high error rates are common but that error rates
> are highly variable in large-sample studies, with some systems seeing
> extremely high error rates. ECC gives a significant assurance based on an
> incremental cost, so what's your data worth? You're not guaranteed to be
> screwed by not using ECC (and the Google paper doesn't say this either),
> but you are assuming risks that ECC mitigates. Look at the above blog,
> however: even DIMMs that are high-quality but non-ECC can go wrong and
> result in nasty system corruption.
>
> What generally protects you in terms of pool integrity is metadata
> redundancy on top of integrity checks, but if you flip bits on metadata
> in-core before writing redundant copies, well, that's a risk to pool
> integrity.
>
> I also think it's mistaken to say this is distinctly a problem with ZFS.
> Any "next-generation" filesystem that provides protections against on-disk
> corruption via checksums ends up with a residual risk focus on making sure
> that in-core data integrity is robust. You could well have those problems
> on the pools you've deployed, and there are a lot of situations in you'd
> never know and quite a lot (such as most of the bits in a photo or MP3)
> where you'd never notice low rates of bit-flipping. The fact that you
> haven't noticed doesn't equate to there being no problems in a strict
> sense, it's far more likely that you've been able to tolerate the flipping
> that's happened. The guy at Sun with the blog above got lucky: he was
> running high-quality non-ECC RAM, and it went pear-shaped, at least for
> metadata cancer, quite quickly, allowing him to recover by rolling back
> snapshots.
>
> Take a look out there, and you'll find people who are very confused about
> the risks and available mitigations. I found someone saying that there's no
> problem with more traditional RAID technologies because disks have CRCs. By
> comparison, you can find Bonwick, educated as a statistician, talking about
> SHA256 collisions by comparison to undetected ECC error rates and
> introducing ZFS data integrity safeguards by way of analogy to ECC. That's
> why the large-sample studies are interesting and useful: none of this
> technology makes data corruption impossible, it just goes to extreme length
> to marginalise the chances of those events by addressing known sources of
> errors and fundamental error scenarios--in-core is so core that if you
> tolerate error there, those errors will characterize systematic behaviour
> where you have better outcomes reasonably available (and that's
> **reasonably** available, I would suggest, in a way that the Madison
> paper's recommendation to make ZFS buffers magical isn't). CRC-32 does a
> great job detecting bad sectors and preventing them from being read back,
> but SHA256 in the right place in a system detects errors that a
> well-conceived vdev topology will generally make recoverable. That includes
> catching cases where an error isn't caught by CRC-32, which may be a rare
> result, but when you've got the kind of data densities that ZFS can allow,
> you're rolling the dice often enough that those results beco

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-11 Thread Eric

Thanks! I will definitely take this out with my afternoon tea for a read C:


On Fri, Apr 11, 2014 at 5:09 PM, Bayard Bell wrote:

> If you want more of a smoking gun report on data corruption without ECC,
> try:
>
> https://blogs.oracle.com/vlad/entry/zfs_likes_to_have_ecc
>
> This view isn't isolated in terms of what people at Sun thought or what
> people at Oracle now think. Trying googling for "zfs ecc site:
> blogs.oracle.com", and you'll find a recurring statement that ECC should
> be used even in home deployment, with maybe one odd exception.
>
> The Wikipedia article, correctly summarising the Google study, is plain in
> saying not that extremely high error rates are common but that error rates
> are highly variable in large-sample studies, with some systems seeing
> extremely high error rates. ECC gives a significant assurance based on an
> incremental cost, so what's your data worth? You're not guaranteed to be
> screwed by not using ECC (and the Google paper doesn't say this either),
> but you are assuming risks that ECC mitigates. Look at the above blog,
> however: even DIMMs that are high-quality but non-ECC can go wrong and
> result in nasty system corruption.
>
> What generally protects you in terms of pool integrity is metadata
> redundancy on top of integrity checks, but if you flip bits on metadata
> in-core before writing redundant copies, well, that's a risk to pool
> integrity.
>
> I also think it's mistaken to say this is distinctly a problem with ZFS.
> Any "next-generation" filesystem that provides protections against on-disk
> corruption via checksums ends up with a residual risk focus on making sure
> that in-core data integrity is robust. You could well have those problems
> on the pools you've deployed, and there are a lot of situations in you'd
> never know and quite a lot (such as most of the bits in a photo or MP3)
> where you'd never notice low rates of bit-flipping. The fact that you
> haven't noticed doesn't equate to there being no problems in a strict
> sense, it's far more likely that you've been able to tolerate the flipping
> that's happened. The guy at Sun with the blog above got lucky: he was
> running high-quality non-ECC RAM, and it went pear-shaped, at least for
> metadata cancer, quite quickly, allowing him to recover by rolling back
> snapshots.
>
> Take a look out there, and you'll find people who are very confused about
> the risks and available mitigations. I found someone saying that there's no
> problem with more traditional RAID technologies because disks have CRCs. By
> comparison, you can find Bonwick, educated as a statistician, talking about
> SHA256 collisions by comparison to undetected ECC error rates and
> introducing ZFS data integrity safeguards by way of analogy to ECC. That's
> why the large-sample studies are interesting and useful: none of this
> technology makes data corruption impossible, it just goes to extreme length
> to marginalise the chances of those events by addressing known sources of
> errors and fundamental error scenarios--in-core is so core that if you
> tolerate error there, those errors will characterize systematic behaviour
> where you have better outcomes reasonably available (and that's
> **reasonably** available, I would suggest, in a way that the Madison
> paper's recommendation to make ZFS buffers magical isn't). CRC-32 does a
> great job detecting bad sectors and preventing them from being read back,
> but SHA256 in the right place in a system detects errors that a
> well-conceived vdev topology will generally make recoverable. That includes
> catching cases where an error isn't caught by CRC-32, which may be a rare
> result, but when you've got the kind of data densities that ZFS can allow,
> you're rolling the dice often enough that those results become interesting.
>
> ECC is one of the most basic steps to take, and if you look at the
> architectural literature, that's how it's treated. If you really want to be
> in on the joke, find the opensolaris zfs list thread from 2009 where
> someone asks about ECC, and someone else jumps in to remark on how
> VirtualBox can be poison for pool integrity for reasons rehearsed in my
> last post.
>
> Cheers,
> Bayard
>
> On 1 April 2014 12:04, Jason Belec  wrote:
>
>> ZFS is lots of parts, in most cases lots of cheap unreliable parts,
>> refurbished parts, yadda yadda, as posted on this thread and many, many
>> others, any issues are probably not ZFS but the parts of the whole. Yes, it
>> could be ZFS, after you confirm that all the parts ate pristine, maybe.
>>
>> My oldest system running ZFS is an Mac Mini Intel Core Duo with 3GB RAM
>> (not ECC) it is the home server for music, tv shows, movies, and some
>> interim backups. The mini has been modded for ESATA and has 6 drives
>> connected. The pool is 2 RaidZ of 3 mirrored with copies set at 2. Been
>> running since ZFS was released from Apple builds. Lost 3 drives, eventually
>> traced to a new cable that cracked at the conn

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-11 Thread Jason Belec

Excellent. If you feel this is necessary go for it. Those that have systems 
that don't have ECC should just run like the sky is falling by your point view. 
That said, I can guarantee non of the systems I have under my care have issues. 
How do I know? Well the data is tested/compared at regular intervals. Maybe I'm 
the luckiest guy ever, where is that lottery ticket. Is ECC better, possibly, 
probably in heavy load environments, no data has been provided to back this up. 
Especially nothing in the context of what most users needs are at least here in 
the Mac space. Which ECC? Be specific. They are not all the same. Just like 
regular RAM are not all the same. Just like HDDs are not all the same. Fear 
mongering is wonderful and easy. Putting forth a solution guaranteed to be 
better is what's needed now. Did you actually reference a wiki? Seriously? A 
document anyone can edit to suit there view? I guess I come from a different 
era. 

Jason
Sent from my iPhone 5S

> On Apr 11, 2014, at 5:09 PM, Bayard Bell  wrote:
> 
> If you want more of a smoking gun report on data corruption without ECC, try:
> 
> https://blogs.oracle.com/vlad/entry/zfs_likes_to_have_ecc
> 
> This view isn't isolated in terms of what people at Sun thought or what 
> people at Oracle now think. Trying googling for "zfs ecc 
> site:blogs.oracle.com", and you'll find a recurring statement that ECC should 
> be used even in home deployment, with maybe one odd exception.
> 
> The Wikipedia article, correctly summarising the Google study, is plain in 
> saying not that extremely high error rates are common but that error rates 
> are highly variable in large-sample studies, with some systems seeing 
> extremely high error rates. ECC gives a significant assurance based on an 
> incremental cost, so what's your data worth? You're not guaranteed to be 
> screwed by not using ECC (and the Google paper doesn't say this either), but 
> you are assuming risks that ECC mitigates. Look at the above blog, however: 
> even DIMMs that are high-quality but non-ECC can go wrong and result in nasty 
> system corruption.
> 
> What generally protects you in terms of pool integrity is metadata redundancy 
> on top of integrity checks, but if you flip bits on metadata in-core before 
> writing redundant copies, well, that's a risk to pool integrity.
>  
> I also think it's mistaken to say this is distinctly a problem with ZFS. Any 
> "next-generation" filesystem that provides protections against on-disk 
> corruption via checksums ends up with a residual risk focus on making sure 
> that in-core data integrity is robust. You could well have those problems on 
> the pools you've deployed, and there are a lot of situations in you'd never 
> know and quite a lot (such as most of the bits in a photo or MP3) where you'd 
> never notice low rates of bit-flipping. The fact that you haven't noticed 
> doesn't equate to there being no problems in a strict sense, it's far more 
> likely that you've been able to tolerate the flipping that's happened. The 
> guy at Sun with the blog above got lucky: he was running high-quality non-ECC 
> RAM, and it went pear-shaped, at least for metadata cancer, quite quickly, 
> allowing him to recover by rolling back snapshots.
> 
> Take a look out there, and you'll find people who are very confused about the 
> risks and available mitigations. I found someone saying that there's no 
> problem with more traditional RAID technologies because disks have CRCs. By 
> comparison, you can find Bonwick, educated as a statistician, talking about 
> SHA256 collisions by comparison to undetected ECC error rates and introducing 
> ZFS data integrity safeguards by way of analogy to ECC. That's why the 
> large-sample studies are interesting and useful: none of this technology 
> makes data corruption impossible, it just goes to extreme length to 
> marginalise the chances of those events by addressing known sources of errors 
> and fundamental error scenarios--in-core is so core that if you tolerate 
> error there, those errors will characterize systematic behaviour where you 
> have better outcomes reasonably available (and that's **reasonably** 
> available, I would suggest, in a way that the Madison paper's recommendation 
> to make ZFS buffers magical isn't). CRC-32 does a great job detecting bad 
> sectors and preventing them from being read back, but SHA256 in the right 
> place in a system detects errors that a well-conceived vdev topology will 
> generally make recoverable. That includes catching cases where an error isn't 
> caught by CRC-32, which may be a rare result, but when you've got the kind of 
> data densities that ZFS can allow, you're rolling the dice often enough that 
> those results become interesting.
> 
> ECC is one of the most basic steps to take, and if you look at the 
> architectural literature, that's how it's treated. If you really want to be 
> in on the joke, find the opensolaris zfs list thread from 2009 where

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-11 Thread Daniel Becker

So to summarize that article, "using ECC memory is safer than not using ECC
memory." I don't think this was ever in doubt. Note that he does *not* talk
about anything like the hypothetical "a scrub will corrupt all your data"
scenario (nor is anything like that mentioned in his popular "ZFS: Read Me
1st" article); in fact, the only really ZFS-specific point that he raises
at all is the part about dirty data likely being in memory (= vulnerable to
bit flips) for longer than it would be in other file systems.


On Fri, Apr 11, 2014 at 12:29 PM, Philip Robar wrote:

>
> From Andrew Galloway of Nexenta (Whom I'm pretty sure most would accept as
> the definition of a ZFS expert.*)
>
> ECC vs non-ECC RAM: The Great Debate:
>
> http://nex7.blogspot.com/2014/03/ecc-vs-non-ecc-ram-great-debate.html
>
>
> * "...I've been on literally 1000's of large ZFS deployments in the last
> 2+ years, often called in when they were broken, and much of what I say is
> backed up by quite a bit of experience. This article is also often used,
> cited, reviewed, and so on by many of my fellow ZFS support personnel, so
> it gets around and mistakes in it get back to me eventually. I can be wrong
> - but especially if you're new to ZFS, you're going to be better served not
> assuming I am. :)"
>
>
> Phil
>
>  --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "zfs-macos" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to zfs-macos+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-11 Thread Bayard Bell

If you want more of a smoking gun report on data corruption without ECC,
try:

https://blogs.oracle.com/vlad/entry/zfs_likes_to_have_ecc

This view isn't isolated in terms of what people at Sun thought or what
people at Oracle now think. Trying googling for "zfs ecc site:
blogs.oracle.com", and you'll find a recurring statement that ECC should be
used even in home deployment, with maybe one odd exception.

The Wikipedia article, correctly summarising the Google study, is plain in
saying not that extremely high error rates are common but that error rates
are highly variable in large-sample studies, with some systems seeing
extremely high error rates. ECC gives a significant assurance based on an
incremental cost, so what's your data worth? You're not guaranteed to be
screwed by not using ECC (and the Google paper doesn't say this either),
but you are assuming risks that ECC mitigates. Look at the above blog,
however: even DIMMs that are high-quality but non-ECC can go wrong and
result in nasty system corruption.

What generally protects you in terms of pool integrity is metadata
redundancy on top of integrity checks, but if you flip bits on metadata
in-core before writing redundant copies, well, that's a risk to pool
integrity.

I also think it's mistaken to say this is distinctly a problem with ZFS.
Any "next-generation" filesystem that provides protections against on-disk
corruption via checksums ends up with a residual risk focus on making sure
that in-core data integrity is robust. You could well have those problems
on the pools you've deployed, and there are a lot of situations in you'd
never know and quite a lot (such as most of the bits in a photo or MP3)
where you'd never notice low rates of bit-flipping. The fact that you
haven't noticed doesn't equate to there being no problems in a strict
sense, it's far more likely that you've been able to tolerate the flipping
that's happened. The guy at Sun with the blog above got lucky: he was
running high-quality non-ECC RAM, and it went pear-shaped, at least for
metadata cancer, quite quickly, allowing him to recover by rolling back
snapshots.

Take a look out there, and you'll find people who are very confused about
the risks and available mitigations. I found someone saying that there's no
problem with more traditional RAID technologies because disks have CRCs. By
comparison, you can find Bonwick, educated as a statistician, talking about
SHA256 collisions by comparison to undetected ECC error rates and
introducing ZFS data integrity safeguards by way of analogy to ECC. That's
why the large-sample studies are interesting and useful: none of this
technology makes data corruption impossible, it just goes to extreme length
to marginalise the chances of those events by addressing known sources of
errors and fundamental error scenarios--in-core is so core that if you
tolerate error there, those errors will characterize systematic behaviour
where you have better outcomes reasonably available (and that's
**reasonably** available, I would suggest, in a way that the Madison
paper's recommendation to make ZFS buffers magical isn't). CRC-32 does a
great job detecting bad sectors and preventing them from being read back,
but SHA256 in the right place in a system detects errors that a
well-conceived vdev topology will generally make recoverable. That includes
catching cases where an error isn't caught by CRC-32, which may be a rare
result, but when you've got the kind of data densities that ZFS can allow,
you're rolling the dice often enough that those results become interesting.

ECC is one of the most basic steps to take, and if you look at the
architectural literature, that's how it's treated. If you really want to be
in on the joke, find the opensolaris zfs list thread from 2009 where
someone asks about ECC, and someone else jumps in to remark on how
VirtualBox can be poison for pool integrity for reasons rehearsed in my
last post.

Cheers,
Bayard

On 1 April 2014 12:04, Jason Belec  wrote:

> ZFS is lots of parts, in most cases lots of cheap unreliable parts,
> refurbished parts, yadda yadda, as posted on this thread and many, many
> others, any issues are probably not ZFS but the parts of the whole. Yes, it
> could be ZFS, after you confirm that all the parts ate pristine, maybe.
>
> My oldest system running ZFS is an Mac Mini Intel Core Duo with 3GB RAM
> (not ECC) it is the home server for music, tv shows, movies, and some
> interim backups. The mini has been modded for ESATA and has 6 drives
> connected. The pool is 2 RaidZ of 3 mirrored with copies set at 2. Been
> running since ZFS was released from Apple builds. Lost 3 drives, eventually
> traced to a new cable that cracked at the connector which when hot enough
> expanded lifting 2 pins free of their connector counter parts resulting in
> errors. Visually almost impossible to see. I replaced port multipliers,
> Esata cards, RAM, mini's, power supply, reinstalled OS, reinstalled ZFS,
> restored ZFS data fr

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-11 Thread Eric

It's the VDI running FreeBSD on, created by its wizard

$ gpart show -l
=>  34  16777149  ada0  GPT  (8.0G)
34   128 1  (null)  (64K)
   162  15935360 2  (null)  (7.6G)
  15935522839680 3  (null)  (410M)
  16775202  1981- free -  (991K)



On Fri, Apr 11, 2014 at 4:02 PM, Chris Ridd  wrote:

>
> On 11 Apr 2014, at 20:42, Eric  wrote:
>
> > I don't have a proper dump, but I did get a kernel panic on my ZFS box.
> This is just informational. I'm not sure what caused it, but I'm guessing
> it's memory related.
>
> It looks like it is panicing after being unable to bring some page in from
> swap. Your swap looks like it is on ada0p2.
>
> Is ada0 one of your external drives or a vbox virtual disk? How is that
> disk partitioned and what is using the other partitions?
>
> Chris
>
> --
>
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "zfs-macos" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/zfs-macos/qguq6LCf1QQ/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> zfs-macos+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-11 Thread Chris Ridd

On 11 Apr 2014, at 20:42, Eric  wrote:

> I don't have a proper dump, but I did get a kernel panic on my ZFS box. This 
> is just informational. I'm not sure what caused it, but I'm guessing it's 
> memory related.

It looks like it is panicing after being unable to bring some page in from 
swap. Your swap looks like it is on ada0p2.

Is ada0 one of your external drives or a vbox virtual disk? How is that disk 
partitioned and what is using the other partitions?

Chris

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-02 Thread Eric

I have both my hands up, throwing anything and hoping for something to
stick to the wall =\


On Wed, Apr 2, 2014 at 8:37 PM, Daniel Becker  wrote:

> On Apr 2, 2014, at 3:08 PM, Matt Elliott 
> wrote:
>
> > Not true.  ZFS flushes also mark known states.  If the zfs stack issues
> a flush and the system returns, it uses that as a guarantee that that data
> is now on disk.
>
> However, that guarantee is only needed to ensure that on-disk data is
> consistent even if the contents of the cache is lost, e.g. due to sudden
> power loss. A disk cache never just loses dirty data in normal operation.
>
> > later writes will assume that the data was written and if the hard drive
> later changes the write order (which some disks will do for performance)
> things break.  You can have issues if any part of the disk chain lies about
> the completion of flush commands.
>
> What would break, in your opinion? Again, as long as you don’t somehow
> lose the contents of your cache, it really doesn’t matter at all what’s
> physically on the disk and what’s still in the cache.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-02 Thread Daniel Becker

On Apr 2, 2014, at 3:08 PM, Matt Elliott  wrote:

> Not true.  ZFS flushes also mark known states.  If the zfs stack issues a 
> flush and the system returns, it uses that as a guarantee that that data is 
> now on disk.

However, that guarantee is only needed to ensure that on-disk data is 
consistent even if the contents of the cache is lost, e.g. due to sudden power 
loss. A disk cache never just loses dirty data in normal operation.

> later writes will assume that the data was written and if the hard drive 
> later changes the write order (which some disks will do for performance) 
> things break.  You can have issues if any part of the disk chain lies about 
> the completion of flush commands.

What would break, in your opinion? Again, as long as you don’t somehow lose the 
contents of your cache, it really doesn’t matter at all what’s physically on 
the disk and what’s still in the cache.

smime.p7s
Description: S/MIME cryptographic signature

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-02 Thread Matt Elliott

On Apr 2, 2014, at 1:38 PM, Daniel Becker  wrote:

> The only time this should make a difference is when your host experiences an 
> unclean shutdown / reset / crash.
> 
> On Apr 2, 2014, at 8:49 AM, Eric  wrote:

Not true.  ZFS flushes also mark known states.  If the zfs stack issues a flush 
and the system returns, it uses that as a guarantee that that data is now on 
disk.  later writes will assume that the data was written and if the hard drive 
later changes the write order (which some disks will do for performance) things 
break.  You can have issues if any part of the disk chain lies about the 
completion of flush commands.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-02 Thread Eric

eh, I suspected that


On Wed, Apr 2, 2014 at 2:38 PM, Daniel Becker  wrote:

> The only time this should make a difference is when your host experiences
> an unclean shutdown / reset / crash.
>
> On Apr 2, 2014, at 8:49 AM, Eric  wrote:
>
> I believe we are referring to the same things. I JUST read about cache
> flushing. ZFS does cache flushing and VirtualBox ignores cache flushes by
> default.
>
> Please, if you can, let me know the key settings you have used.
>
> From the documentation that I read, the command it said to issue is:
>
> VBoxManage setextradata "VM name"
>> "VBoxInternal/Devices/ahci/0/LUN#[x]/Config/IgnoreFlush" 0
>>
>
> Where [x] is the disk value
>
>
> On Wed, Apr 2, 2014 at 2:37 AM, Boyd Waters  wrote:
>
>> I was able to destroy ZFS pools by trying to access them from inside
>> VirtualBox. Until I read the detailed documentation, and set the disk
>> buffer options correctly. I will dig into my notes and post the key setting
>> to this thread when I find it.
>>
>> But I've used ZFS for many years without ECC RAM with no trouble. It
>> isn't the best way to,go, but it isn't the lack of ECC that's killing a ZFS
>> pool. It's the hypervisor hardware emulation and buffering.
>>
>> Sent from my iPad
>>
>> On Apr 1, 2014, at 5:24 PM, Jason Belec 
>> wrote:
>>
>> I think Bayard has hit on some very interesting points, part of what I
>> was alluding to, but very well presented here.
>>
>> Jason
>> Sent from my iPhone 5S
>>
>> On Apr 1, 2014, at 7:14 PM, Bayard Bell 
>> wrote:
>>
>> Could you explain how you're using VirtualBox and why you'd use a type 2
>> hypervisor in this context?
>>
>> Here's a scenario where you really have to mind with hypervisors: ZFS
>> tells a virtualised controller that it needs to sync a buffer, and the
>> controller tells ZFS that all's well while perhaps requesting an async
>> flush. ZFS thinks it's done all the I/Os to roll a TXG to stable storage,
>> but in the mean time something else crashes and whoosh go your buffers.
>>
>> I'm not sure it's come across particularly well in this thread, but ZFS
>> doesn't and can't cope with hardware that's so unreliable that it tells
>> lies about basic things, like whether your writes have made it to stable
>> storage, or doesn't mind the shop, as is the case with non-ECC memory. It's
>> one thing when you have a device reading back something that doesn't match
>> the checksum, but it gets uglier when you've got a single I/O path and a
>> controller that seems to write the wrong bits in stride (I've seen this) or
>> when the problems are even closer to home (and again I emphasise RAM). You
>> may not have problems right away. You may have problems where you can't
>> tell the difference, like flipping bits in data buffers that have no other
>> integrity checks. But you can run into complex failure scenarios where ZFS
>> has to cash in on guarantees that were rather more approximate than what it
>> was told, and then it may not be a case of having some bits flipped in
>> photos or MP3s but no longer being able to import your pool or having
>> someone who knows how to operate zdb do some additional TXG rollback to get
>> your data back after losing some updates.
>>
>> I don't know if you're running ZFS in a VM or running VMs on top of ZFS,
>> but either way, you probably want to Google for "data loss" "VirtualBox"
>> and whatever device you're emulating and see whether there are known
>> issues. You can find issue reports out there on VirtualBox data loss, but
>> working through bug reports can be challenging.
>>
>> Cheers,
>> Bayard
>>
>> On 1 April 2014 16:34, Eric Jaw  wrote:
>>
>>>
>>>
>>> On Tuesday, April 1, 2014 7:04:39 AM UTC-4, jasonbelec wrote:

 ZFS is lots of parts, in most cases lots of cheap unreliable parts,
 refurbished parts, yadda yadda, as posted on this thread and many, many
 others, any issues are probably not ZFS but the parts of the whole. Yes, it
 could be ZFS, after you confirm that all the parts ate pristine, maybe.

>>>
>>> I don't think it's ZFS. ZFS is pretty solid. In my specific case, I'm
>>> trying to figure out why VirtualBox is creating these issues. I'm pretty
>>> sure that's the root cause, but I don't know why yet. So I'm just
>>> speculating at this point. Of course, I want to get my ZFS up and running
>>> so I can move on to what I really need to do, so it's easy to jump on a
>>> conclusion about something that I haven't thought of in my position. Hope
>>> you can understand
>>>
>>>

 My oldest system running ZFS is an Mac Mini Intel Core Duo with 3GB RAM
 (not ECC) it is the home server for music, tv shows, movies, and some
 interim backups. The mini has been modded for ESATA and has 6 drives
 connected. The pool is 2 RaidZ of 3 mirrored with copies set at 2. Been
 running since ZFS was released from Apple builds. Lost 3 drives, eventually
 traced to a new cable that cracked at the connector which when hot enough
 expanded lifting 2

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-02 Thread Daniel Becker

The only time this should make a difference is when your host experiences an 
unclean shutdown / reset / crash.

> On Apr 2, 2014, at 8:49 AM, Eric  wrote:
> 
> I believe we are referring to the same things. I JUST read about cache 
> flushing. ZFS does cache flushing and VirtualBox ignores cache flushes by 
> default.
> 
> Please, if you can, let me know the key settings you have used.
> 
> From the documentation that I read, the command it said to issue is:
> 
>> VBoxManage setextradata "VM name" 
>> "VBoxInternal/Devices/ahci/0/LUN#[x]/Config/IgnoreFlush" 0
> 
> Where [x] is the disk value 
> 
> 
>> On Wed, Apr 2, 2014 at 2:37 AM, Boyd Waters  wrote:
>> I was able to destroy ZFS pools by trying to access them from inside 
>> VirtualBox. Until I read the detailed documentation, and set the disk buffer 
>> options correctly. I will dig into my notes and post the key setting to this 
>> thread when I find it.
>> 
>> But I've used ZFS for many years without ECC RAM with no trouble. It isn't 
>> the best way to,go, but it isn't the lack of ECC that's killing a ZFS pool. 
>> It's the hypervisor hardware emulation and buffering.
>> 
>> Sent from my iPad
>> 
>>> On Apr 1, 2014, at 5:24 PM, Jason Belec  wrote:
>>> 
>>> I think Bayard has hit on some very interesting points, part of what I was 
>>> alluding to, but very well presented here. 
>>> 
>>> Jason
>>> Sent from my iPhone 5S
>>> 
 On Apr 1, 2014, at 7:14 PM, Bayard Bell  
 wrote:

 Could you explain how you're using VirtualBox and why you'd use a type 2 
 hypervisor in this context?

 Here's a scenario where you really have to mind with hypervisors: ZFS 
 tells a virtualised controller that it needs to sync a buffer, and the 
 controller tells ZFS that all's well while perhaps requesting an async 
 flush. ZFS thinks it's done all the I/Os to roll a TXG to stable storage, 
 but in the mean time something else crashes and whoosh go your buffers.

 I'm not sure it's come across particularly well in this thread, but ZFS 
 doesn't and can't cope with hardware that's so unreliable that it tells 
 lies about basic things, like whether your writes have made it to stable 
 storage, or doesn't mind the shop, as is the case with non-ECC memory. 
 It's one thing when you have a device reading back something that doesn't 
 match the checksum, but it gets uglier when you've got a single I/O path 
 and a controller that seems to write the wrong bits in stride (I've seen 
 this) or when the problems are even closer to home (and again I emphasise 
 RAM). You may not have problems right away. You may have problems where 
 you can't tell the difference, like flipping bits in data buffers that 
 have no other integrity checks. But you can run into complex failure 
 scenarios where ZFS has to cash in on guarantees that were rather more 
 approximate than what it was told, and then it may not be a case of having 
 some bits flipped in photos or MP3s but no longer being able to import 
 your pool or having someone who knows how to operate zdb do some 
 additional TXG rollback to get your data back after losing some updates.

 I don't know if you're running ZFS in a VM or running VMs on top of ZFS, 
 but either way, you probably want to Google for "data loss" "VirtualBox" 
 and whatever device you're emulating and see whether there are known 
 issues. You can find issue reports out there on VirtualBox data loss, but 
 working through bug reports can be challenging.

 Cheers,
 Bayard

> On 1 April 2014 16:34, Eric Jaw  wrote:
> 
> 
>> On Tuesday, April 1, 2014 7:04:39 AM UTC-4, jasonbelec wrote:
>> ZFS is lots of parts, in most cases lots of cheap unreliable parts, 
>> refurbished parts, yadda yadda, as posted on this thread and many, many 
>> others, any issues are probably not ZFS but the parts of the whole. Yes, 
>> it could be ZFS, after you confirm that all the parts ate pristine, 
>> maybe. 
> 
> 
> I don't think it's ZFS. ZFS is pretty solid. In my specific case, I'm 
> trying to figure out why VirtualBox is creating these issues. I'm pretty 
> sure that's the root cause, but I don't know why yet. So I'm just 
> speculating at this point. Of course, I want to get my ZFS up and running 
> so I can move on to what I really need to do, so it's easy to jump on a 
> conclusion about something that I haven't thought of in my position. Hope 
> you can understand
>  
>> 
>> My oldest system running ZFS is an Mac Mini Intel Core Duo with 3GB RAM 
>> (not ECC) it is the home server for music, tv shows, movies, and some 
>> interim backups. The mini has been modded for ESATA and has 6 drives 
>> connected. The pool is 2 RaidZ of 3 mirrored with copies set at 2. Been 
>> running since ZFS was released from Apple builds. Lost 3 driv

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-02 Thread Eric

I believe we are referring to the same things. I JUST read about cache
flushing. ZFS does cache flushing and VirtualBox ignores cache flushes by
default.

Please, if you can, let me know the key settings you have used.

>From the documentation that I read, the command it said to issue is:

VBoxManage setextradata "VM name"
> "VBoxInternal/Devices/ahci/0/LUN#[x]/Config/IgnoreFlush" 0
>

Where [x] is the disk value


On Wed, Apr 2, 2014 at 2:37 AM, Boyd Waters  wrote:

> I was able to destroy ZFS pools by trying to access them from inside
> VirtualBox. Until I read the detailed documentation, and set the disk
> buffer options correctly. I will dig into my notes and post the key setting
> to this thread when I find it.
>
> But I've used ZFS for many years without ECC RAM with no trouble. It isn't
> the best way to,go, but it isn't the lack of ECC that's killing a ZFS pool.
> It's the hypervisor hardware emulation and buffering.
>
> Sent from my iPad
>
> On Apr 1, 2014, at 5:24 PM, Jason Belec 
> wrote:
>
> I think Bayard has hit on some very interesting points, part of what I was
> alluding to, but very well presented here.
>
> Jason
> Sent from my iPhone 5S
>
> On Apr 1, 2014, at 7:14 PM, Bayard Bell 
> wrote:
>
> Could you explain how you're using VirtualBox and why you'd use a type 2
> hypervisor in this context?
>
> Here's a scenario where you really have to mind with hypervisors: ZFS
> tells a virtualised controller that it needs to sync a buffer, and the
> controller tells ZFS that all's well while perhaps requesting an async
> flush. ZFS thinks it's done all the I/Os to roll a TXG to stable storage,
> but in the mean time something else crashes and whoosh go your buffers.
>
> I'm not sure it's come across particularly well in this thread, but ZFS
> doesn't and can't cope with hardware that's so unreliable that it tells
> lies about basic things, like whether your writes have made it to stable
> storage, or doesn't mind the shop, as is the case with non-ECC memory. It's
> one thing when you have a device reading back something that doesn't match
> the checksum, but it gets uglier when you've got a single I/O path and a
> controller that seems to write the wrong bits in stride (I've seen this) or
> when the problems are even closer to home (and again I emphasise RAM). You
> may not have problems right away. You may have problems where you can't
> tell the difference, like flipping bits in data buffers that have no other
> integrity checks. But you can run into complex failure scenarios where ZFS
> has to cash in on guarantees that were rather more approximate than what it
> was told, and then it may not be a case of having some bits flipped in
> photos or MP3s but no longer being able to import your pool or having
> someone who knows how to operate zdb do some additional TXG rollback to get
> your data back after losing some updates.
>
> I don't know if you're running ZFS in a VM or running VMs on top of ZFS,
> but either way, you probably want to Google for "data loss" "VirtualBox"
> and whatever device you're emulating and see whether there are known
> issues. You can find issue reports out there on VirtualBox data loss, but
> working through bug reports can be challenging.
>
> Cheers,
> Bayard
>
> On 1 April 2014 16:34, Eric Jaw  wrote:
>
>>
>>
>> On Tuesday, April 1, 2014 7:04:39 AM UTC-4, jasonbelec wrote:
>>>
>>> ZFS is lots of parts, in most cases lots of cheap unreliable parts,
>>> refurbished parts, yadda yadda, as posted on this thread and many, many
>>> others, any issues are probably not ZFS but the parts of the whole. Yes, it
>>> could be ZFS, after you confirm that all the parts ate pristine, maybe.
>>>
>>
>> I don't think it's ZFS. ZFS is pretty solid. In my specific case, I'm
>> trying to figure out why VirtualBox is creating these issues. I'm pretty
>> sure that's the root cause, but I don't know why yet. So I'm just
>> speculating at this point. Of course, I want to get my ZFS up and running
>> so I can move on to what I really need to do, so it's easy to jump on a
>> conclusion about something that I haven't thought of in my position. Hope
>> you can understand
>>
>>
>>>
>>> My oldest system running ZFS is an Mac Mini Intel Core Duo with 3GB RAM
>>> (not ECC) it is the home server for music, tv shows, movies, and some
>>> interim backups. The mini has been modded for ESATA and has 6 drives
>>> connected. The pool is 2 RaidZ of 3 mirrored with copies set at 2. Been
>>> running since ZFS was released from Apple builds. Lost 3 drives, eventually
>>> traced to a new cable that cracked at the connector which when hot enough
>>> expanded lifting 2 pins free of their connector counter parts resulting in
>>> errors. Visually almost impossible to see. I replaced port multipliers,
>>> Esata cards, RAM, mini's, power supply, reinstalled OS, reinstalled ZFS,
>>> restored ZFS data from backup, finally to find the bad connector end one
>>> because it was hot and felt 'funny'.
>>>
>>> Frustrating, ye

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-02 Thread Eric

All this talk about controller, sync, buffer, storage, cache got me
thinking.

I looked up out ZFS handles cache flushing, and how VirtualBox handles
cache flushing.

*According to
http://docs.oracle.com/cd/E26505_01/html/E37386/chapterzfs-6.html
*

ZFS issues *infrequent flushes *(every 5 second or so) after the uberblock
> updates. The flushing infrequency is fairly inconsequential so no tuning is
> warranted here. ZFS also issues a flush every time an application requests
> a synchronous write (O_DSYNC, fsync, NFS commit, and so on).
>

*According to http://www.virtualbox.org/manual/ch12.html
*

12.2.2. Responding to guest IDE/SATA flush requests
>
> If desired, the virtual disk images can be flushed when the guest issues
> the IDE FLUSH CACHE command. Normally *these requests are ignored *for
> improved performance. The parameters below are only accepted for disk
> drives. They must not be set for DVD drives.
>

I'm going to enable cache flushing and see how that affects results




On Tue, Apr 1, 2014 at 7:14 PM, Bayard Bell wrote:

> Could you explain how you're using VirtualBox and why you'd use a type 2
> hypervisor in this context?
>
> Here's a scenario where you really have to mind with hypervisors: ZFS
> tells a virtualised controller that it needs to sync a buffer, and the
> controller tells ZFS that all's well while perhaps requesting an async
> flush. ZFS thinks it's done all the I/Os to roll a TXG to stable storage,
> but in the mean time something else crashes and whoosh go your buffers.
>
> I'm not sure it's come across particularly well in this thread, but ZFS
> doesn't and can't cope with hardware that's so unreliable that it tells
> lies about basic things, like whether your writes have made it to stable
> storage, or doesn't mind the shop, as is the case with non-ECC memory. It's
> one thing when you have a device reading back something that doesn't match
> the checksum, but it gets uglier when you've got a single I/O path and a
> controller that seems to write the wrong bits in stride (I've seen this) or
> when the problems are even closer to home (and again I emphasise RAM). You
> may not have problems right away. You may have problems where you can't
> tell the difference, like flipping bits in data buffers that have no other
> integrity checks. But you can run into complex failure scenarios where ZFS
> has to cash in on guarantees that were rather more approximate than what it
> was told, and then it may not be a case of having some bits flipped in
> photos or MP3s but no longer being able to import your pool or having
> someone who knows how to operate zdb do some additional TXG rollback to get
> your data back after losing some updates.
>
> I don't know if you're running ZFS in a VM or running VMs on top of ZFS,
> but either way, you probably want to Google for "data loss" "VirtualBox"
> and whatever device you're emulating and see whether there are known
> issues. You can find issue reports out there on VirtualBox data loss, but
> working through bug reports can be challenging.
>
> Cheers,
> Bayard
>
> On 1 April 2014 16:34, Eric Jaw  wrote:
>
>>
>>
>> On Tuesday, April 1, 2014 7:04:39 AM UTC-4, jasonbelec wrote:
>>>
>>> ZFS is lots of parts, in most cases lots of cheap unreliable parts,
>>> refurbished parts, yadda yadda, as posted on this thread and many, many
>>> others, any issues are probably not ZFS but the parts of the whole. Yes, it
>>> could be ZFS, after you confirm that all the parts ate pristine, maybe.
>>>
>>
>> I don't think it's ZFS. ZFS is pretty solid. In my specific case, I'm
>> trying to figure out why VirtualBox is creating these issues. I'm pretty
>> sure that's the root cause, but I don't know why yet. So I'm just
>> speculating at this point. Of course, I want to get my ZFS up and running
>> so I can move on to what I really need to do, so it's easy to jump on a
>> conclusion about something that I haven't thought of in my position. Hope
>> you can understand
>>
>>
>>>
>>> My oldest system running ZFS is an Mac Mini Intel Core Duo with 3GB RAM
>>> (not ECC) it is the home server for music, tv shows, movies, and some
>>> interim backups. The mini has been modded for ESATA and has 6 drives
>>> connected. The pool is 2 RaidZ of 3 mirrored with copies set at 2. Been
>>> running since ZFS was released from Apple builds. Lost 3 drives, eventually
>>> traced to a new cable that cracked at the connector which when hot enough
>>> expanded lifting 2 pins free of their connector counter parts resulting in
>>> errors. Visually almost impossible to see. I replaced port multipliers,
>>> Esata cards, RAM, mini's, power supply, reinstalled OS, reinstalled ZFS,
>>> restored ZFS data from backup, finally to find the bad connector end one
>>> because it was hot and felt 'funny'.
>>>
>>> Frustrating, yes, educational also. The happy news is, all the data was
>>> fine

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-02 Thread Eric

Here's the topography of the Host and Guest system layout

[SSD][SSD]
==> [RAID0]
> [Host]
==> [HHD0] --> \\.\PhysicalDrive0 --> raw vmdk --> PhysicalDrive0.vmdk
==> [HHD1] --> \\.\PhysicalDrive1 --> raw vmdk --> PhysicalDrive1.vmdk
==> [HHD2] --> \\.\PhysicalDrive2 --> raw vmdk --> PhysicalDrive2.vmdk
==> [HHD3] --> \\.\PhysicalDrive3 --> raw vmdk --> PhysicalDrive3.vmdk
==> [HHD4] --> \\.\PhysicalDrive4 --> raw vmdk --> PhysicalDrive4.vmdk
==> [HHD5] --> \\.\PhysicalDrive5 --> raw vmdk --> PhysicalDrive5.vmdk
> [Guest]
==> PhysicalDrive0.vmdk
==> PhysicalDrive1.vmdk
==> PhysicalDrive2.vmdk
==> PhysicalDrive3.vmdk
==> PhysicalDrive4.vmdk
==> PhysicalDrive5.vmdk

HHD0 and HHD1 are unmounted, NTFS partitioned drives, because they hold a
mirror copy of my data
There are two SSDs, SSD0 and SSD1 (not listed), that are created the same
way as the HDD, and mounted as zil and l2arc devices.


On Tue, Apr 1, 2014 at 5:50 PM, Jason Belec wrote:

> Going through this bit by bit, but some things that I take issue with but
> may be interpreting incorrectly.
>
> You created several vmdk's on C: drive (9), your running Windows on this
> drive, as well as Virtualbox which has an OS making use of the vmdk's, this
> correct? If yes, we may have stumbled across your issue, thats a lot of i/o
> for the underlying drive, some of it fighting with the other contenders.
> You list 6 physical drives, reason they are not utilized? Perhaps just
> moving the vmdk's to another drive might at least help with the stress.
>
> As an example, I never host the VM on the OS drive, just like I never host
> ZFS on the OS drive FreeBSD can of course, but I believe attention must be
> paid to setup) even if I have room for a partition (tried that in the past).
>
>
>
> --
> Jason Belec
> Sent from my iPad
>
> On Apr 1, 2014, at 4:25 PM, Eric  wrote:
>
> Attached is my vbox Guest settings, and added it to the forums post as
> well (https://forums.virtualbox.org/viewtopic.php?f=6&t=60975)
>
> The NAT issue is small. I switched my SSH server back to Bridge Mode and
> everything worked again. There was something about NAT mode where it was
> breaking the connection and wasn't letting SSH work normally.
>
>
>
>
> On Tue, Apr 1, 2014 at 4:13 PM, Jason Belec wrote:
>
>> I looked through your thread, but I almost always tell people - "STOP
>> using Windows unless its in a VM". ;)
>>
>> Not enough info in your thread to actually help you with the VM. What are
>> the Guest settings? What drives are actually assigned to what, scripts are
>> only useful after you setup something functional.
>>
>> As for the NAT issue thread, I don't think its an issue so much a
>> misconception how it works in relation to the parts in question,
>> specifically Windows, the VM and the Guest. I have never really had issues
>> like this but I've never tried with parts your using in the sequence
>> described. As for why it might not work... The Guest settings info might be
>> relevant here as well.
>>
>>
>>
>> --
>> Jason Belec
>> Sent from my iPad
>>
>> On Apr 1, 2014, at 3:46 PM, Eric  wrote:
>>
>> haha train away!
>>
>> This is what I'm trying to do for my own needs. Issues or no issues, I
>> haven't seen it done before. So, I'm reaching out to anyone. Mac or not,
>> I'm just asking from one IT professional to another, is this possible, and
>> if not, why not? (that's just how I feel)
>>
>> I'm assuming the complications you mean are the ways FreeBSD behaves when
>> running specifically in VBox under Windows, because that's what I'm trying
>> to figure out.
>>
>> Details are in the forum post, but yes, it's a clean setup with a
>> dedicated vdi for the os. Networking shouldn't be related, but it's working
>> as well.
>>
>>
>> On Tue, Apr 1, 2014 at 3:17 PM, Jason Belec 
>> wrote:
>>
>>> OK. So your running Windows, asking questions on the MacZFS list. That's
>>> going to cause problems right out of the gate. And your asking about
>>> FreeBSD running under VirtualBox for issues with ZFS.
>>>
>>> I know it's not nice, bit I'm laughing myself purple. This is going to
>>> make it into my training sessions.
>>>
>>> The only advice I can give you at this point is you have made a very
>>> complicated situation for yourself. Back up and start with Windows, ensure
>>> networking us functions. Then a clean VM of FreeBSD make sure networking is
>>> functioning however you want it to. Now setup ZFS where you may have to
>>> pre-set/create devices just for the VM to utilize so that OS's are not
>>> fighting for the same drive(s)/space.
>>>
>>>
>>> Jason
>>> Sent from my iPhone 5S
>>>
>>> On Apr 1, 2014, at 12:03 PM, Eric  wrote:
>>>
>>> I have the details on the setup posted to virtualbox's forums, here:
>>> https://forums.virtualbox.org/viewtopic.php?f=6&t=60975
>>>
>>> Essentially, I'm running ZFS on FreeBSD10 in VBox running in Windows 7.
>>> Rather than the other way around. I think I mentioned that e

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-02 Thread Eric

Yes, this is correct


On Tue, Apr 1, 2014 at 6:15 PM, Daniel Becker  wrote:

> He’s creating “raw” (= pass-through) disk images; i.e., the backing store
> is a physical disk, not the vmdk file itself.
>
>
> On Apr 1, 2014, at 2:50 PM, Jason Belec 
> wrote:
>
> Going through this bit by bit, but some things that I take issue with but
> may be interpreting incorrectly.
>
> You created several vmdk's on C: drive (9), your running Windows on this
> drive, as well as Virtualbox which has an OS making use of the vmdk's, this
> correct? If yes, we may have stumbled across your issue, thats a lot of i/o
> for the underlying drive, some of it fighting with the other contenders.
> You list 6 physical drives, reason they are not utilized? Perhaps just
> moving the vmdk's to another drive might at least help with the stress.
>
> As an example, I never host the VM on the OS drive, just like I never host
> ZFS on the OS drive FreeBSD can of course, but I believe attention must be
> paid to setup) even if I have room for a partition (tried that in the past).
>
>
> --
> Jason Belec
> Sent from my iPad
>
> On Apr 1, 2014, at 4:25 PM, Eric  wrote:
>
> Attached is my vbox Guest settings, and added it to the forums post as
> well (https://forums.virtualbox.org/viewtopic.php?f=6&t=60975)
>
> The NAT issue is small. I switched my SSH server back to Bridge Mode and
> everything worked again. There was something about NAT mode where it was
> breaking the connection and wasn't letting SSH work normally.
>
>
>
>
> On Tue, Apr 1, 2014 at 4:13 PM, Jason Belec wrote:
>
>> I looked through your thread, but I almost always tell people - "STOP
>> using Windows unless its in a VM". ;)
>>
>> Not enough info in your thread to actually help you with the VM. What are
>> the Guest settings? What drives are actually assigned to what, scripts are
>> only useful after you setup something functional.
>>
>> As for the NAT issue thread, I don't think its an issue so much a
>> misconception how it works in relation to the parts in question,
>> specifically Windows, the VM and the Guest. I have never really had issues
>> like this but I've never tried with parts your using in the sequence
>> described. As for why it might not work... The Guest settings info might be
>> relevant here as well.
>>
>>
>>
>> --
>> Jason Belec
>> Sent from my iPad
>>
>> On Apr 1, 2014, at 3:46 PM, Eric  wrote:
>>
>> haha train away!
>>
>> This is what I'm trying to do for my own needs. Issues or no issues, I
>> haven't seen it done before. So, I'm reaching out to anyone. Mac or not,
>> I'm just asking from one IT professional to another, is this possible, and
>> if not, why not? (that's just how I feel)
>>
>> I'm assuming the complications you mean are the ways FreeBSD behaves when
>> running specifically in VBox under Windows, because that's what I'm trying
>> to figure out.
>>
>> Details are in the forum post, but yes, it's a clean setup with a
>> dedicated vdi for the os. Networking shouldn't be related, but it's working
>> as well.
>>
>>
>> On Tue, Apr 1, 2014 at 3:17 PM, Jason Belec 
>> wrote:
>>
>>> OK. So your running Windows, asking questions on the MacZFS list. That's
>>> going to cause problems right out of the gate. And your asking about
>>> FreeBSD running under VirtualBox for issues with ZFS.
>>>
>>> I know it's not nice, bit I'm laughing myself purple. This is going to
>>> make it into my training sessions.
>>>
>>> The only advice I can give you at this point is you have made a very
>>> complicated situation for yourself. Back up and start with Windows, ensure
>>> networking us functions. Then a clean VM of FreeBSD make sure networking is
>>> functioning however you want it to. Now setup ZFS where you may have to
>>> pre-set/create devices just for the VM to utilize so that OS's are not
>>> fighting for the same drive(s)/space.
>>>
>>>
>>> Jason
>>> Sent from my iPhone 5S
>>>
>>> On Apr 1, 2014, at 12:03 PM, Eric  wrote:
>>>
>>> I have the details on the setup posted to virtualbox's forums, here:
>>> https://forums.virtualbox.org/viewtopic.php?f=6&t=60975
>>>
>>> Essentially, I'm running ZFS on FreeBSD10 in VBox running in Windows 7.
>>> Rather than the other way around. I think I mentioned that earlier
>>>
>>>
>>> I just created a short post about the NAT Network issue, here:
>>> https://forums.virtualbox.org/viewtopic.php?f=6&t=60992
>>>
>>>
>>> On Tue, Apr 1, 2014 at 11:58 AM, Jason Belec >> > wrote:
>>>
 I run over 30 instances of Virtualbox with various OSs without issue
 all running ontop of ZFS environments. Most of my clients have at least 3
 VMs running a variant of Windows ontop of ZFS without any issues. Not sure
 what you mean with your NAT issue. Perhaps posting your setup info might be
 of more help.



 --
 Jason Belec
 Sent from my iPad

 On Apr 1, 2014, at 11:34 AM, Eric Jaw  wrote:



 On Tuesday, April 1, 2014 7:04:39 AM UTC-4, jasonbelec wrote:
>
> ZFS is lots of p

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-01 Thread Boyd Waters

I was able to destroy ZFS pools by trying to access them from inside 
VirtualBox. Until I read the detailed documentation, and set the disk buffer 
options correctly. I will dig into my notes and post the key setting to this 
thread when I find it.

But I've used ZFS for many years without ECC RAM with no trouble. It isn't the 
best way to,go, but it isn't the lack of ECC that's killing a ZFS pool. It's 
the hypervisor hardware emulation and buffering.

Sent from my iPad

> On Apr 1, 2014, at 5:24 PM, Jason Belec  wrote:
> 
> I think Bayard has hit on some very interesting points, part of what I was 
> alluding to, but very well presented here. 
> 
> Jason
> Sent from my iPhone 5S
> 
>> On Apr 1, 2014, at 7:14 PM, Bayard Bell  wrote:
>> 
>> Could you explain how you're using VirtualBox and why you'd use a type 2 
>> hypervisor in this context?
>> 
>> Here's a scenario where you really have to mind with hypervisors: ZFS tells 
>> a virtualised controller that it needs to sync a buffer, and the controller 
>> tells ZFS that all's well while perhaps requesting an async flush. ZFS 
>> thinks it's done all the I/Os to roll a TXG to stable storage, but in the 
>> mean time something else crashes and whoosh go your buffers.
>> 
>> I'm not sure it's come across particularly well in this thread, but ZFS 
>> doesn't and can't cope with hardware that's so unreliable that it tells lies 
>> about basic things, like whether your writes have made it to stable storage, 
>> or doesn't mind the shop, as is the case with non-ECC memory. It's one thing 
>> when you have a device reading back something that doesn't match the 
>> checksum, but it gets uglier when you've got a single I/O path and a 
>> controller that seems to write the wrong bits in stride (I've seen this) or 
>> when the problems are even closer to home (and again I emphasise RAM). You 
>> may not have problems right away. You may have problems where you can't tell 
>> the difference, like flipping bits in data buffers that have no other 
>> integrity checks. But you can run into complex failure scenarios where ZFS 
>> has to cash in on guarantees that were rather more approximate than what it 
>> was told, and then it may not be a case of having some bits flipped in 
>> photos or MP3s but no longer being able to import your pool or having 
>> someone who knows how to operate zdb do some additional TXG rollback to get 
>> your data back after losing some updates.
>> 
>> I don't know if you're running ZFS in a VM or running VMs on top of ZFS, but 
>> either way, you probably want to Google for "data loss" "VirtualBox" and 
>> whatever device you're emulating and see whether there are known issues. You 
>> can find issue reports out there on VirtualBox data loss, but working 
>> through bug reports can be challenging.
>> 
>> Cheers,
>> Bayard
>> 
>>> On 1 April 2014 16:34, Eric Jaw  wrote:
>>> 
>>> 
 On Tuesday, April 1, 2014 7:04:39 AM UTC-4, jasonbelec wrote:
 ZFS is lots of parts, in most cases lots of cheap unreliable parts, 
 refurbished parts, yadda yadda, as posted on this thread and many, many 
 others, any issues are probably not ZFS but the parts of the whole. Yes, 
 it could be ZFS, after you confirm that all the parts ate pristine, maybe. 
>>> 
>>> 
>>> I don't think it's ZFS. ZFS is pretty solid. In my specific case, I'm 
>>> trying to figure out why VirtualBox is creating these issues. I'm pretty 
>>> sure that's the root cause, but I don't know why yet. So I'm just 
>>> speculating at this point. Of course, I want to get my ZFS up and running 
>>> so I can move on to what I really need to do, so it's easy to jump on a 
>>> conclusion about something that I haven't thought of in my position. Hope 
>>> you can understand
>>>  

 My oldest system running ZFS is an Mac Mini Intel Core Duo with 3GB RAM 
 (not ECC) it is the home server for music, tv shows, movies, and some 
 interim backups. The mini has been modded for ESATA and has 6 drives 
 connected. The pool is 2 RaidZ of 3 mirrored with copies set at 2. Been 
 running since ZFS was released from Apple builds. Lost 3 drives, 
 eventually traced to a new cable that cracked at the connector which when 
 hot enough expanded lifting 2 pins free of their connector counter parts 
 resulting in errors. Visually almost impossible to see. I replaced port 
 multipliers, Esata cards, RAM, mini's, power supply, reinstalled OS, 
 reinstalled ZFS, restored ZFS data from backup, finally to find the bad 
 connector end one because it was hot and felt 'funny'. 

 Frustrating, yes, educational also. The happy news is, all the data was 
 fine, wife would have torn me to shreds if photos were missing, music was 
 corrupt, etc., etc.. And this was on the old out of date but stable ZFS 
 version we Mac users have been hugging onto for dear life. YMMV

 Never had RAM as the issue, here in the mad science lab across 10 rot

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-01 Thread Jason Belec

I think Bayard has hit on some very interesting points, part of what I was 
alluding to, but very well presented here. 

Jason
Sent from my iPhone 5S

> On Apr 1, 2014, at 7:14 PM, Bayard Bell  wrote:
> 
> Could you explain how you're using VirtualBox and why you'd use a type 2 
> hypervisor in this context?
> 
> Here's a scenario where you really have to mind with hypervisors: ZFS tells a 
> virtualised controller that it needs to sync a buffer, and the controller 
> tells ZFS that all's well while perhaps requesting an async flush. ZFS thinks 
> it's done all the I/Os to roll a TXG to stable storage, but in the mean time 
> something else crashes and whoosh go your buffers.
> 
> I'm not sure it's come across particularly well in this thread, but ZFS 
> doesn't and can't cope with hardware that's so unreliable that it tells lies 
> about basic things, like whether your writes have made it to stable storage, 
> or doesn't mind the shop, as is the case with non-ECC memory. It's one thing 
> when you have a device reading back something that doesn't match the 
> checksum, but it gets uglier when you've got a single I/O path and a 
> controller that seems to write the wrong bits in stride (I've seen this) or 
> when the problems are even closer to home (and again I emphasise RAM). You 
> may not have problems right away. You may have problems where you can't tell 
> the difference, like flipping bits in data buffers that have no other 
> integrity checks. But you can run into complex failure scenarios where ZFS 
> has to cash in on guarantees that were rather more approximate than what it 
> was told, and then it may not be a case of having some bits flipped in photos 
> or MP3s but no longer being able to import your pool or having someone who 
> knows how to operate zdb do some additional TXG rollback to get your data 
> back after losing some updates.
> 
> I don't know if you're running ZFS in a VM or running VMs on top of ZFS, but 
> either way, you probably want to Google for "data loss" "VirtualBox" and 
> whatever device you're emulating and see whether there are known issues. You 
> can find issue reports out there on VirtualBox data loss, but working through 
> bug reports can be challenging.
> 
> Cheers,
> Bayard
> 
>> On 1 April 2014 16:34, Eric Jaw  wrote:
>> 
>> 
>>> On Tuesday, April 1, 2014 7:04:39 AM UTC-4, jasonbelec wrote:
>>> ZFS is lots of parts, in most cases lots of cheap unreliable parts, 
>>> refurbished parts, yadda yadda, as posted on this thread and many, many 
>>> others, any issues are probably not ZFS but the parts of the whole. Yes, it 
>>> could be ZFS, after you confirm that all the parts ate pristine, maybe. 
>> 
>> 
>> I don't think it's ZFS. ZFS is pretty solid. In my specific case, I'm trying 
>> to figure out why VirtualBox is creating these issues. I'm pretty sure 
>> that's the root cause, but I don't know why yet. So I'm just speculating at 
>> this point. Of course, I want to get my ZFS up and running so I can move on 
>> to what I really need to do, so it's easy to jump on a conclusion about 
>> something that I haven't thought of in my position. Hope you can understand
>>  
>>> 
>>> My oldest system running ZFS is an Mac Mini Intel Core Duo with 3GB RAM 
>>> (not ECC) it is the home server for music, tv shows, movies, and some 
>>> interim backups. The mini has been modded for ESATA and has 6 drives 
>>> connected. The pool is 2 RaidZ of 3 mirrored with copies set at 2. Been 
>>> running since ZFS was released from Apple builds. Lost 3 drives, eventually 
>>> traced to a new cable that cracked at the connector which when hot enough 
>>> expanded lifting 2 pins free of their connector counter parts resulting in 
>>> errors. Visually almost impossible to see. I replaced port multipliers, 
>>> Esata cards, RAM, mini's, power supply, reinstalled OS, reinstalled ZFS, 
>>> restored ZFS data from backup, finally to find the bad connector end one 
>>> because it was hot and felt 'funny'. 
>>> 
>>> Frustrating, yes, educational also. The happy news is, all the data was 
>>> fine, wife would have torn me to shreds if photos were missing, music was 
>>> corrupt, etc., etc.. And this was on the old out of date but stable ZFS 
>>> version we Mac users have been hugging onto for dear life. YMMV
>>> 
>>> Never had RAM as the issue, here in the mad science lab across 10 rotating 
>>> systems or in any client location - pick your decade. However I don't use 
>>> cheap RAM either, and I only have 2 Systems requiring ECC currently that 
>>> don't even connect to ZFS as they are both XServers with other lives.
>>> 
>>> 
>>> --
>>> Jason Belec
>>> Sent from my iPad
>>> 
 On Apr 1, 2014, at 12:13 AM, Daniel Becker  wrote:
 
>>> 
> On Mar 31, 2014, at 7:41 PM, Eric Jaw  wrote:
> 
> I started using ZFS about a few weeks ago, so a lot of it is still new to 
> me. I'm actually not completely certain about "proper procedure" for 
> repairing a pool. I'm not sure if

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-01 Thread Bayard Bell

Could you explain how you're using VirtualBox and why you'd use a type 2
hypervisor in this context?

Here's a scenario where you really have to mind with hypervisors: ZFS tells
a virtualised controller that it needs to sync a buffer, and the controller
tells ZFS that all's well while perhaps requesting an async flush. ZFS
thinks it's done all the I/Os to roll a TXG to stable storage, but in the
mean time something else crashes and whoosh go your buffers.

I'm not sure it's come across particularly well in this thread, but ZFS
doesn't and can't cope with hardware that's so unreliable that it tells
lies about basic things, like whether your writes have made it to stable
storage, or doesn't mind the shop, as is the case with non-ECC memory. It's
one thing when you have a device reading back something that doesn't match
the checksum, but it gets uglier when you've got a single I/O path and a
controller that seems to write the wrong bits in stride (I've seen this) or
when the problems are even closer to home (and again I emphasise RAM). You
may not have problems right away. You may have problems where you can't
tell the difference, like flipping bits in data buffers that have no other
integrity checks. But you can run into complex failure scenarios where ZFS
has to cash in on guarantees that were rather more approximate than what it
was told, and then it may not be a case of having some bits flipped in
photos or MP3s but no longer being able to import your pool or having
someone who knows how to operate zdb do some additional TXG rollback to get
your data back after losing some updates.

I don't know if you're running ZFS in a VM or running VMs on top of ZFS,
but either way, you probably want to Google for "data loss" "VirtualBox"
and whatever device you're emulating and see whether there are known
issues. You can find issue reports out there on VirtualBox data loss, but
working through bug reports can be challenging.

Cheers,
Bayard

On 1 April 2014 16:34, Eric Jaw  wrote:

>
>
> On Tuesday, April 1, 2014 7:04:39 AM UTC-4, jasonbelec wrote:
>>
>> ZFS is lots of parts, in most cases lots of cheap unreliable parts,
>> refurbished parts, yadda yadda, as posted on this thread and many, many
>> others, any issues are probably not ZFS but the parts of the whole. Yes, it
>> could be ZFS, after you confirm that all the parts ate pristine, maybe.
>>
>
> I don't think it's ZFS. ZFS is pretty solid. In my specific case, I'm
> trying to figure out why VirtualBox is creating these issues. I'm pretty
> sure that's the root cause, but I don't know why yet. So I'm just
> speculating at this point. Of course, I want to get my ZFS up and running
> so I can move on to what I really need to do, so it's easy to jump on a
> conclusion about something that I haven't thought of in my position. Hope
> you can understand
>
>
>>
>> My oldest system running ZFS is an Mac Mini Intel Core Duo with 3GB RAM
>> (not ECC) it is the home server for music, tv shows, movies, and some
>> interim backups. The mini has been modded for ESATA and has 6 drives
>> connected. The pool is 2 RaidZ of 3 mirrored with copies set at 2. Been
>> running since ZFS was released from Apple builds. Lost 3 drives, eventually
>> traced to a new cable that cracked at the connector which when hot enough
>> expanded lifting 2 pins free of their connector counter parts resulting in
>> errors. Visually almost impossible to see. I replaced port multipliers,
>> Esata cards, RAM, mini's, power supply, reinstalled OS, reinstalled ZFS,
>> restored ZFS data from backup, finally to find the bad connector end one
>> because it was hot and felt 'funny'.
>>
>> Frustrating, yes, educational also. The happy news is, all the data was
>> fine, wife would have torn me to shreds if photos were missing, music was
>> corrupt, etc., etc.. And this was on the old out of date but stable ZFS
>> version we Mac users have been hugging onto for dear life. YMMV
>>
>> Never had RAM as the issue, here in the mad science lab across 10
>> rotating systems or in any client location - pick your decade. However I
>> don't use cheap RAM either, and I only have 2 Systems requiring ECC
>> currently that don't even connect to ZFS as they are both XServers with
>> other lives.
>>
>>
>> --
>> Jason Belec
>> Sent from my iPad
>>
>> On Apr 1, 2014, at 12:13 AM, Daniel Becker  wrote:
>>
>> On Mar 31, 2014, at 7:41 PM, Eric Jaw  wrote:
>>
>> I started using ZFS about a few weeks ago, so a lot of it is still new to
>> me. I'm actually not completely certain about "proper procedure" for
>> repairing a pool. I'm not sure if I'm supposed to clear the errors after
>> the scrub, before or after (little things). I'm not sure if it even
>> matters. When I restarted the VM, the checksum counts cleared on its own.
>>
>>
>> The counts are not maintained across reboots.
>>
>>
>> On the first scrub it repaired roughly 1.65MB. None on the second scub.
>> Even after the scrub there were still 43 data errors. I was expecting the

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-01 Thread Daniel Becker

He's creating "raw" (= pass-through) disk images; i.e., the backing store is a 
physical disk, not the vmdk file itself.


On Apr 1, 2014, at 2:50 PM, Jason Belec  wrote:

> Going through this bit by bit, but some things that I take issue with but may 
> be interpreting incorrectly.
> 
> You created several vmdk's on C: drive (9), your running Windows on this 
> drive, as well as Virtualbox which has an OS making use of the vmdk's, this 
> correct? If yes, we may have stumbled across your issue, thats a lot of i/o 
> for the underlying drive, some of it fighting with the other contenders. You 
> list 6 physical drives, reason they are not utilized? Perhaps just moving the 
> vmdk's to another drive might at least help with the stress.
> 
> As an example, I never host the VM on the OS drive, just like I never host 
> ZFS on the OS drive FreeBSD can of course, but I believe attention must be 
> paid to setup) even if I have room for a partition (tried that in the past).
> 
> 
> --
> Jason Belec
> Sent from my iPad
> 
> On Apr 1, 2014, at 4:25 PM, Eric  wrote:
> 
>> Attached is my vbox Guest settings, and added it to the forums post as well 
>> (https://forums.virtualbox.org/viewtopic.php?f=6&t=60975)
>> 
>> The NAT issue is small. I switched my SSH server back to Bridge Mode and 
>> everything worked again. There was something about NAT mode where it was 
>> breaking the connection and wasn't letting SSH work normally.
>> 
>> 
>> 
>> 
>> On Tue, Apr 1, 2014 at 4:13 PM, Jason Belec  
>> wrote:
>> I looked through your thread, but I almost always tell people - "STOP using 
>> Windows unless its in a VM". ;)
>> 
>> Not enough info in your thread to actually help you with the VM. What are 
>> the Guest settings? What drives are actually assigned to what, scripts are 
>> only useful after you setup something functional.
>> 
>> As for the NAT issue thread, I don't think its an issue so much a 
>> misconception how it works in relation to the parts in question, 
>> specifically Windows, the VM and the Guest. I have never really had issues 
>> like this but I've never tried with parts your using in the sequence 
>> described. As for why it might not work... The Guest settings info might be 
>> relevant here as well.
>> 
>> 
>> 
>> --
>> Jason Belec
>> Sent from my iPad
>> 
>> On Apr 1, 2014, at 3:46 PM, Eric  wrote:
>> 
>>> haha train away!
>>> 
>>> This is what I'm trying to do for my own needs. Issues or no issues, I 
>>> haven't seen it done before. So, I'm reaching out to anyone. Mac or not, 
>>> I'm just asking from one IT professional to another, is this possible, and 
>>> if not, why not? (that's just how I feel)
>>> 
>>> I'm assuming the complications you mean are the ways FreeBSD behaves when 
>>> running specifically in VBox under Windows, because that's what I'm trying 
>>> to figure out.
>>> 
>>> Details are in the forum post, but yes, it's a clean setup with a dedicated 
>>> vdi for the os. Networking shouldn't be related, but it's working as well.
>>> 
>>> 
>>> On Tue, Apr 1, 2014 at 3:17 PM, Jason Belec  
>>> wrote:
>>> OK. So your running Windows, asking questions on the MacZFS list. That's 
>>> going to cause problems right out of the gate. And your asking about 
>>> FreeBSD running under VirtualBox for issues with ZFS. 
>>> 
>>> I know it's not nice, bit I'm laughing myself purple. This is going to make 
>>> it into my training sessions. 
>>> 
>>> The only advice I can give you at this point is you have made a very 
>>> complicated situation for yourself. Back up and start with Windows, ensure 
>>> networking us functions. Then a clean VM of FreeBSD make sure networking is 
>>> functioning however you want it to. Now setup ZFS where you may have to 
>>> pre-set/create devices just for the VM to utilize so that OS's are not 
>>> fighting for the same drive(s)/space. 
>>> 
>>> 
>>> Jason
>>> Sent from my iPhone 5S
>>> 
>>> On Apr 1, 2014, at 12:03 PM, Eric  wrote:
>>> 
 I have the details on the setup posted to virtualbox's forums, here: 
 https://forums.virtualbox.org/viewtopic.php?f=6&t=60975
 
 Essentially, I'm running ZFS on FreeBSD10 in VBox running in Windows 7. 
 Rather than the other way around. I think I mentioned that earlier
 
 
 I just created a short post about the NAT Network issue, here: 
 https://forums.virtualbox.org/viewtopic.php?f=6&t=60992
 
 
 On Tue, Apr 1, 2014 at 11:58 AM, Jason Belec  
 wrote:
 I run over 30 instances of Virtualbox with various OSs without issue all 
 running ontop of ZFS environments. Most of my clients have at least 3 VMs 
 running a variant of Windows ontop of ZFS without any issues. Not sure 
 what you mean with your NAT issue. Perhaps posting your setup info might 
 be of more help.
 
 
 
 --
 Jason Belec
 Sent from my iPad
 
 On Apr 1, 2014, at 11:34 AM, Eric Jaw  wrote:
 
> 
> 
> On Tuesday, April 1, 2014 7:04:39 AM UTC-4, jasonbelec

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-01 Thread Jason Belec

Going through this bit by bit, but some things that I take issue with but may 
be interpreting incorrectly.

You created several vmdk's on C: drive (9), your running Windows on this drive, 
as well as Virtualbox which has an OS making use of the vmdk's, this correct? 
If yes, we may have stumbled across your issue, thats a lot of i/o for the 
underlying drive, some of it fighting with the other contenders. You list 6 
physical drives, reason they are not utilized? Perhaps just moving the vmdk's 
to another drive might at least help with the stress.

As an example, I never host the VM on the OS drive, just like I never host ZFS 
on the OS drive FreeBSD can of course, but I believe attention must be paid to 
setup) even if I have room for a partition (tried that in the past).


--
Jason Belec
Sent from my iPad

> On Apr 1, 2014, at 4:25 PM, Eric  wrote:
> 
> Attached is my vbox Guest settings, and added it to the forums post as well 
> (https://forums.virtualbox.org/viewtopic.php?f=6&t=60975)
> 
> The NAT issue is small. I switched my SSH server back to Bridge Mode and 
> everything worked again. There was something about NAT mode where it was 
> breaking the connection and wasn't letting SSH work normally.
> 
> 
> 
> 
>> On Tue, Apr 1, 2014 at 4:13 PM, Jason Belec  
>> wrote:
>> I looked through your thread, but I almost always tell people - "STOP using 
>> Windows unless its in a VM". ;)
>> 
>> Not enough info in your thread to actually help you with the VM. What are 
>> the Guest settings? What drives are actually assigned to what, scripts are 
>> only useful after you setup something functional.
>> 
>> As for the NAT issue thread, I don't think its an issue so much a 
>> misconception how it works in relation to the parts in question, 
>> specifically Windows, the VM and the Guest. I have never really had issues 
>> like this but I've never tried with parts your using in the sequence 
>> described. As for why it might not work... The Guest settings info might be 
>> relevant here as well.
>> 
>> 
>> 
>> --
>> Jason Belec
>> Sent from my iPad
>> 
>>> On Apr 1, 2014, at 3:46 PM, Eric  wrote:
>>> 
>>> haha train away!
>>> 
>>> This is what I'm trying to do for my own needs. Issues or no issues, I 
>>> haven't seen it done before. So, I'm reaching out to anyone. Mac or not, 
>>> I'm just asking from one IT professional to another, is this possible, and 
>>> if not, why not? (that's just how I feel)
>>> 
>>> I'm assuming the complications you mean are the ways FreeBSD behaves when 
>>> running specifically in VBox under Windows, because that's what I'm trying 
>>> to figure out.
>>> 
>>> Details are in the forum post, but yes, it's a clean setup with a dedicated 
>>> vdi for the os. Networking shouldn't be related, but it's working as well.
>>> 
>>> 
 On Tue, Apr 1, 2014 at 3:17 PM, Jason Belec  
 wrote:
 OK. So your running Windows, asking questions on the MacZFS list. That's 
 going to cause problems right out of the gate. And your asking about 
 FreeBSD running under VirtualBox for issues with ZFS. 
 
 I know it's not nice, bit I'm laughing myself purple. This is going to 
 make it into my training sessions. 
 
 The only advice I can give you at this point is you have made a very 
 complicated situation for yourself. Back up and start with Windows, ensure 
 networking us functions. Then a clean VM of FreeBSD make sure networking 
 is functioning however you want it to. Now setup ZFS where you may have to 
 pre-set/create devices just for the VM to utilize so that OS's are not 
 fighting for the same drive(s)/space. 
 
 
 Jason
 Sent from my iPhone 5S
 
> On Apr 1, 2014, at 12:03 PM, Eric  wrote:
> 
> I have the details on the setup posted to virtualbox's forums, here: 
> https://forums.virtualbox.org/viewtopic.php?f=6&t=60975
> 
> Essentially, I'm running ZFS on FreeBSD10 in VBox running in Windows 7. 
> Rather than the other way around. I think I mentioned that earlier
> 
> 
> I just created a short post about the NAT Network issue, here: 
> https://forums.virtualbox.org/viewtopic.php?f=6&t=60992
> 
> 
>> On Tue, Apr 1, 2014 at 11:58 AM, Jason Belec 
>>  wrote:
>> I run over 30 instances of Virtualbox with various OSs without issue all 
>> running ontop of ZFS environments. Most of my clients have at least 3 
>> VMs running a variant of Windows ontop of ZFS without any issues. Not 
>> sure what you mean with your NAT issue. Perhaps posting your setup info 
>> might be of more help.
>> 
>> 
>> 
>> --
>> Jason Belec
>> Sent from my iPad
>> 
>>> On Apr 1, 2014, at 11:34 AM, Eric Jaw  wrote:
>>> 
>>> 
>>> 
 On Tuesday, April 1, 2014 7:04:39 AM UTC-4, jasonbelec wrote:
 ZFS is lots of parts, in most cases lots of cheap unreliable parts, 
 refurbished parts, yadda yadda, as posted on thi

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-01 Thread Eric

Attached is my vbox Guest settings, and added it to the forums post as well
(https://forums.virtualbox.org/viewtopic.php?f=6&t=60975)

The NAT issue is small. I switched my SSH server back to Bridge Mode and
everything worked again. There was something about NAT mode where it was
breaking the connection and wasn't letting SSH work normally.




On Tue, Apr 1, 2014 at 4:13 PM, Jason Belec wrote:

> I looked through your thread, but I almost always tell people - "STOP
> using Windows unless its in a VM". ;)
>
> Not enough info in your thread to actually help you with the VM. What are
> the Guest settings? What drives are actually assigned to what, scripts are
> only useful after you setup something functional.
>
> As for the NAT issue thread, I don't think its an issue so much a
> misconception how it works in relation to the parts in question,
> specifically Windows, the VM and the Guest. I have never really had issues
> like this but I've never tried with parts your using in the sequence
> described. As for why it might not work... The Guest settings info might be
> relevant here as well.
>
>
>
> --
> Jason Belec
> Sent from my iPad
>
> On Apr 1, 2014, at 3:46 PM, Eric  wrote:
>
> haha train away!
>
> This is what I'm trying to do for my own needs. Issues or no issues, I
> haven't seen it done before. So, I'm reaching out to anyone. Mac or not,
> I'm just asking from one IT professional to another, is this possible, and
> if not, why not? (that's just how I feel)
>
> I'm assuming the complications you mean are the ways FreeBSD behaves when
> running specifically in VBox under Windows, because that's what I'm trying
> to figure out.
>
> Details are in the forum post, but yes, it's a clean setup with a
> dedicated vdi for the os. Networking shouldn't be related, but it's working
> as well.
>
>
> On Tue, Apr 1, 2014 at 3:17 PM, Jason Belec wrote:
>
>> OK. So your running Windows, asking questions on the MacZFS list. That's
>> going to cause problems right out of the gate. And your asking about
>> FreeBSD running under VirtualBox for issues with ZFS.
>>
>> I know it's not nice, bit I'm laughing myself purple. This is going to
>> make it into my training sessions.
>>
>> The only advice I can give you at this point is you have made a very
>> complicated situation for yourself. Back up and start with Windows, ensure
>> networking us functions. Then a clean VM of FreeBSD make sure networking is
>> functioning however you want it to. Now setup ZFS where you may have to
>> pre-set/create devices just for the VM to utilize so that OS's are not
>> fighting for the same drive(s)/space.
>>
>>
>> Jason
>> Sent from my iPhone 5S
>>
>> On Apr 1, 2014, at 12:03 PM, Eric  wrote:
>>
>> I have the details on the setup posted to virtualbox's forums, here:
>> https://forums.virtualbox.org/viewtopic.php?f=6&t=60975
>>
>> Essentially, I'm running ZFS on FreeBSD10 in VBox running in Windows 7.
>> Rather than the other way around. I think I mentioned that earlier
>>
>>
>> I just created a short post about the NAT Network issue, here:
>> https://forums.virtualbox.org/viewtopic.php?f=6&t=60992
>>
>>
>> On Tue, Apr 1, 2014 at 11:58 AM, Jason Belec 
>> wrote:
>>
>>> I run over 30 instances of Virtualbox with various OSs without issue all
>>> running ontop of ZFS environments. Most of my clients have at least 3 VMs
>>> running a variant of Windows ontop of ZFS without any issues. Not sure what
>>> you mean with your NAT issue. Perhaps posting your setup info might be of
>>> more help.
>>>
>>>
>>>
>>> --
>>> Jason Belec
>>> Sent from my iPad
>>>
>>> On Apr 1, 2014, at 11:34 AM, Eric Jaw  wrote:
>>>
>>>
>>>
>>> On Tuesday, April 1, 2014 7:04:39 AM UTC-4, jasonbelec wrote:

 ZFS is lots of parts, in most cases lots of cheap unreliable parts,
 refurbished parts, yadda yadda, as posted on this thread and many, many
 others, any issues are probably not ZFS but the parts of the whole. Yes, it
 could be ZFS, after you confirm that all the parts ate pristine, maybe.

>>>
>>> I don't think it's ZFS. ZFS is pretty solid. In my specific case, I'm
>>> trying to figure out why VirtualBox is creating these issues. I'm pretty
>>> sure that's the root cause, but I don't know why yet. So I'm just
>>> speculating at this point. Of course, I want to get my ZFS up and running
>>> so I can move on to what I really need to do, so it's easy to jump on a
>>> conclusion about something that I haven't thought of in my position. Hope
>>> you can understand
>>>
>>>

 My oldest system running ZFS is an Mac Mini Intel Core Duo with 3GB RAM
 (not ECC) it is the home server for music, tv shows, movies, and some
 interim backups. The mini has been modded for ESATA and has 6 drives
 connected. The pool is 2 RaidZ of 3 mirrored with copies set at 2. Been
 running since ZFS was released from Apple builds. Lost 3 drives, eventually
 traced to a new cable that cracked at the connector which when hot enough
 expanded lift

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-01 Thread Jason Belec

I looked through your thread, but I almost always tell people - "STOP using 
Windows unless its in a VM". ;)

Not enough info in your thread to actually help you with the VM. What are the 
Guest settings? What drives are actually assigned to what, scripts are only 
useful after you setup something functional.

As for the NAT issue thread, I don't think its an issue so much a misconception 
how it works in relation to the parts in question, specifically Windows, the VM 
and the Guest. I have never really had issues like this but I've never tried 
with parts your using in the sequence described. As for why it might not 
work... The Guest settings info might be relevant here as well.


--
Jason Belec
Sent from my iPad

> On Apr 1, 2014, at 3:46 PM, Eric  wrote:
> 
> haha train away!
> 
> This is what I'm trying to do for my own needs. Issues or no issues, I 
> haven't seen it done before. So, I'm reaching out to anyone. Mac or not, I'm 
> just asking from one IT professional to another, is this possible, and if 
> not, why not? (that's just how I feel)
> 
> I'm assuming the complications you mean are the ways FreeBSD behaves when 
> running specifically in VBox under Windows, because that's what I'm trying to 
> figure out.
> 
> Details are in the forum post, but yes, it's a clean setup with a dedicated 
> vdi for the os. Networking shouldn't be related, but it's working as well.
> 
> 
>> On Tue, Apr 1, 2014 at 3:17 PM, Jason Belec  
>> wrote:
>> OK. So your running Windows, asking questions on the MacZFS list. That's 
>> going to cause problems right out of the gate. And your asking about FreeBSD 
>> running under VirtualBox for issues with ZFS. 
>> 
>> I know it's not nice, bit I'm laughing myself purple. This is going to make 
>> it into my training sessions. 
>> 
>> The only advice I can give you at this point is you have made a very 
>> complicated situation for yourself. Back up and start with Windows, ensure 
>> networking us functions. Then a clean VM of FreeBSD make sure networking is 
>> functioning however you want it to. Now setup ZFS where you may have to 
>> pre-set/create devices just for the VM to utilize so that OS's are not 
>> fighting for the same drive(s)/space. 
>> 
>> 
>> Jason
>> Sent from my iPhone 5S
>> 
>>> On Apr 1, 2014, at 12:03 PM, Eric  wrote:
>>> 
>>> I have the details on the setup posted to virtualbox's forums, here: 
>>> https://forums.virtualbox.org/viewtopic.php?f=6&t=60975
>>> 
>>> Essentially, I'm running ZFS on FreeBSD10 in VBox running in Windows 7. 
>>> Rather than the other way around. I think I mentioned that earlier
>>> 
>>> 
>>> I just created a short post about the NAT Network issue, here: 
>>> https://forums.virtualbox.org/viewtopic.php?f=6&t=60992
>>> 
>>> 
 On Tue, Apr 1, 2014 at 11:58 AM, Jason Belec  
 wrote:
 I run over 30 instances of Virtualbox with various OSs without issue all 
 running ontop of ZFS environments. Most of my clients have at least 3 VMs 
 running a variant of Windows ontop of ZFS without any issues. Not sure 
 what you mean with your NAT issue. Perhaps posting your setup info might 
 be of more help.
 
 
 
 --
 Jason Belec
 Sent from my iPad
 
> On Apr 1, 2014, at 11:34 AM, Eric Jaw  wrote:
> 
> 
> 
>> On Tuesday, April 1, 2014 7:04:39 AM UTC-4, jasonbelec wrote:
>> ZFS is lots of parts, in most cases lots of cheap unreliable parts, 
>> refurbished parts, yadda yadda, as posted on this thread and many, many 
>> others, any issues are probably not ZFS but the parts of the whole. Yes, 
>> it could be ZFS, after you confirm that all the parts ate pristine, 
>> maybe. 
> 
> I don't think it's ZFS. ZFS is pretty solid. In my specific case, I'm 
> trying to figure out why VirtualBox is creating these issues. I'm pretty 
> sure that's the root cause, but I don't know why yet. So I'm just 
> speculating at this point. Of course, I want to get my ZFS up and running 
> so I can move on to what I really need to do, so it's easy to jump on a 
> conclusion about something that I haven't thought of in my position. Hope 
> you can understand
>  
>> 
>> My oldest system running ZFS is an Mac Mini Intel Core Duo with 3GB RAM 
>> (not ECC) it is the home server for music, tv shows, movies, and some 
>> interim backups. The mini has been modded for ESATA and has 6 drives 
>> connected. The pool is 2 RaidZ of 3 mirrored with copies set at 2. Been 
>> running since ZFS was released from Apple builds. Lost 3 drives, 
>> eventually traced to a new cable that cracked at the connector which 
>> when hot enough expanded lifting 2 pins free of their connector counter 
>> parts resulting in errors. Visually almost impossible to see. I replaced 
>> port multipliers, Esata cards, RAM, mini's, power supply, reinstalled 
>> OS, reinstalled ZFS, restored ZFS data from backup, finally to find the

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-01 Thread Eric

haha train away!

This is what I'm trying to do for my own needs. Issues or no issues, I
haven't seen it done before. So, I'm reaching out to anyone. Mac or not,
I'm just asking from one IT professional to another, is this possible, and
if not, why not? (that's just how I feel)

I'm assuming the complications you mean are the ways FreeBSD behaves when
running specifically in VBox under Windows, because that's what I'm trying
to figure out.

Details are in the forum post, but yes, it's a clean setup with a dedicated
vdi for the os. Networking shouldn't be related, but it's working as well.


On Tue, Apr 1, 2014 at 3:17 PM, Jason Belec wrote:

> OK. So your running Windows, asking questions on the MacZFS list. That's
> going to cause problems right out of the gate. And your asking about
> FreeBSD running under VirtualBox for issues with ZFS.
>
> I know it's not nice, bit I'm laughing myself purple. This is going to
> make it into my training sessions.
>
> The only advice I can give you at this point is you have made a very
> complicated situation for yourself. Back up and start with Windows, ensure
> networking us functions. Then a clean VM of FreeBSD make sure networking is
> functioning however you want it to. Now setup ZFS where you may have to
> pre-set/create devices just for the VM to utilize so that OS's are not
> fighting for the same drive(s)/space.
>
>
> Jason
> Sent from my iPhone 5S
>
> On Apr 1, 2014, at 12:03 PM, Eric  wrote:
>
> I have the details on the setup posted to virtualbox's forums, here:
> https://forums.virtualbox.org/viewtopic.php?f=6&t=60975
>
> Essentially, I'm running ZFS on FreeBSD10 in VBox running in Windows 7.
> Rather than the other way around. I think I mentioned that earlier
>
>
> I just created a short post about the NAT Network issue, here:
> https://forums.virtualbox.org/viewtopic.php?f=6&t=60992
>
>
> On Tue, Apr 1, 2014 at 11:58 AM, Jason Belec 
> wrote:
>
>> I run over 30 instances of Virtualbox with various OSs without issue all
>> running ontop of ZFS environments. Most of my clients have at least 3 VMs
>> running a variant of Windows ontop of ZFS without any issues. Not sure what
>> you mean with your NAT issue. Perhaps posting your setup info might be of
>> more help.
>>
>>
>>
>> --
>> Jason Belec
>> Sent from my iPad
>>
>> On Apr 1, 2014, at 11:34 AM, Eric Jaw  wrote:
>>
>>
>>
>> On Tuesday, April 1, 2014 7:04:39 AM UTC-4, jasonbelec wrote:
>>>
>>> ZFS is lots of parts, in most cases lots of cheap unreliable parts,
>>> refurbished parts, yadda yadda, as posted on this thread and many, many
>>> others, any issues are probably not ZFS but the parts of the whole. Yes, it
>>> could be ZFS, after you confirm that all the parts ate pristine, maybe.
>>>
>>
>> I don't think it's ZFS. ZFS is pretty solid. In my specific case, I'm
>> trying to figure out why VirtualBox is creating these issues. I'm pretty
>> sure that's the root cause, but I don't know why yet. So I'm just
>> speculating at this point. Of course, I want to get my ZFS up and running
>> so I can move on to what I really need to do, so it's easy to jump on a
>> conclusion about something that I haven't thought of in my position. Hope
>> you can understand
>>
>>
>>>
>>> My oldest system running ZFS is an Mac Mini Intel Core Duo with 3GB RAM
>>> (not ECC) it is the home server for music, tv shows, movies, and some
>>> interim backups. The mini has been modded for ESATA and has 6 drives
>>> connected. The pool is 2 RaidZ of 3 mirrored with copies set at 2. Been
>>> running since ZFS was released from Apple builds. Lost 3 drives, eventually
>>> traced to a new cable that cracked at the connector which when hot enough
>>> expanded lifting 2 pins free of their connector counter parts resulting in
>>> errors. Visually almost impossible to see. I replaced port multipliers,
>>> Esata cards, RAM, mini's, power supply, reinstalled OS, reinstalled ZFS,
>>> restored ZFS data from backup, finally to find the bad connector end one
>>> because it was hot and felt 'funny'.
>>>
>>> Frustrating, yes, educational also. The happy news is, all the data was
>>> fine, wife would have torn me to shreds if photos were missing, music was
>>> corrupt, etc., etc.. And this was on the old out of date but stable ZFS
>>> version we Mac users have been hugging onto for dear life. YMMV
>>>
>>> Never had RAM as the issue, here in the mad science lab across 10
>>> rotating systems or in any client location - pick your decade. However I
>>> don't use cheap RAM either, and I only have 2 Systems requiring ECC
>>> currently that don't even connect to ZFS as they are both XServers with
>>> other lives.
>>>
>>>
>>> --
>>> Jason Belec
>>> Sent from my iPad
>>>
>>> On Apr 1, 2014, at 12:13 AM, Daniel Becker  wrote:
>>>
>>> On Mar 31, 2014, at 7:41 PM, Eric Jaw  wrote:
>>>
>>> I started using ZFS about a few weeks ago, so a lot of it is still new
>>> to me. I'm actually not completely certain about "proper procedure" for
>>> repairing a pool. I'm not s

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-01 Thread Jason Belec

OK. So your running Windows, asking questions on the MacZFS list. That's going 
to cause problems right out of the gate. And your asking about FreeBSD running 
under VirtualBox for issues with ZFS. 

I know it's not nice, bit I'm laughing myself purple. This is going to make it 
into my training sessions. 

The only advice I can give you at this point is you have made a very 
complicated situation for yourself. Back up and start with Windows, ensure 
networking us functions. Then a clean VM of FreeBSD make sure networking is 
functioning however you want it to. Now setup ZFS where you may have to 
pre-set/create devices just for the VM to utilize so that OS's are not fighting 
for the same drive(s)/space. 

Jason
Sent from my iPhone 5S

> On Apr 1, 2014, at 12:03 PM, Eric  wrote:
> 
> I have the details on the setup posted to virtualbox's forums, here: 
> https://forums.virtualbox.org/viewtopic.php?f=6&t=60975
> 
> Essentially, I'm running ZFS on FreeBSD10 in VBox running in Windows 7. 
> Rather than the other way around. I think I mentioned that earlier
> 
> 
> I just created a short post about the NAT Network issue, here: 
> https://forums.virtualbox.org/viewtopic.php?f=6&t=60992
> 
> 
>> On Tue, Apr 1, 2014 at 11:58 AM, Jason Belec  
>> wrote:
>> I run over 30 instances of Virtualbox with various OSs without issue all 
>> running ontop of ZFS environments. Most of my clients have at least 3 VMs 
>> running a variant of Windows ontop of ZFS without any issues. Not sure what 
>> you mean with your NAT issue. Perhaps posting your setup info might be of 
>> more help.
>> 
>> 
>> 
>> --
>> Jason Belec
>> Sent from my iPad
>> 
>>> On Apr 1, 2014, at 11:34 AM, Eric Jaw  wrote:
>>> 
>>> 
>>> 
 On Tuesday, April 1, 2014 7:04:39 AM UTC-4, jasonbelec wrote:
 ZFS is lots of parts, in most cases lots of cheap unreliable parts, 
 refurbished parts, yadda yadda, as posted on this thread and many, many 
 others, any issues are probably not ZFS but the parts of the whole. Yes, 
 it could be ZFS, after you confirm that all the parts ate pristine, maybe. 
>>> 
>>> I don't think it's ZFS. ZFS is pretty solid. In my specific case, I'm 
>>> trying to figure out why VirtualBox is creating these issues. I'm pretty 
>>> sure that's the root cause, but I don't know why yet. So I'm just 
>>> speculating at this point. Of course, I want to get my ZFS up and running 
>>> so I can move on to what I really need to do, so it's easy to jump on a 
>>> conclusion about something that I haven't thought of in my position. Hope 
>>> you can understand
>>>  

 My oldest system running ZFS is an Mac Mini Intel Core Duo with 3GB RAM 
 (not ECC) it is the home server for music, tv shows, movies, and some 
 interim backups. The mini has been modded for ESATA and has 6 drives 
 connected. The pool is 2 RaidZ of 3 mirrored with copies set at 2. Been 
 running since ZFS was released from Apple builds. Lost 3 drives, 
 eventually traced to a new cable that cracked at the connector which when 
 hot enough expanded lifting 2 pins free of their connector counter parts 
 resulting in errors. Visually almost impossible to see. I replaced port 
 multipliers, Esata cards, RAM, mini's, power supply, reinstalled OS, 
 reinstalled ZFS, restored ZFS data from backup, finally to find the bad 
 connector end one because it was hot and felt 'funny'. 

 Frustrating, yes, educational also. The happy news is, all the data was 
 fine, wife would have torn me to shreds if photos were missing, music was 
 corrupt, etc., etc.. And this was on the old out of date but stable ZFS 
 version we Mac users have been hugging onto for dear life. YMMV

 Never had RAM as the issue, here in the mad science lab across 10 rotating 
 systems or in any client location - pick your decade. However I don't use 
 cheap RAM either, and I only have 2 Systems requiring ECC currently that 
 don't even connect to ZFS as they are both XServers with other lives.

 --
 Jason Belec
 Sent from my iPad

> On Apr 1, 2014, at 12:13 AM, Daniel Becker  wrote:
> 
>> On Mar 31, 2014, at 7:41 PM, Eric Jaw  wrote:
>> 
>> I started using ZFS about a few weeks ago, so a lot of it is still new 
>> to me. I'm actually not completely certain about "proper procedure" for 
>> repairing a pool. I'm not sure if I'm supposed to clear the errors after 
>> the scrub, before or after (little things). I'm not sure if it even 
>> matters. When I restarted the VM, the checksum counts cleared on its own.
> 
> The counts are not maintained across reboots.
> 
> 
>> On the first scrub it repaired roughly 1.65MB. None on the second scub. 
>> Even after the scrub there were still 43 data errors. I was expecting 
>> they were going to go away.
>> 
>>> errors: 43 data errors, use '-v' for a list
> 
> What this means

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-01 Thread Eric

I have the details on the setup posted to virtualbox's forums, here:
https://forums.virtualbox.org/viewtopic.php?f=6&t=60975

Essentially, I'm running ZFS on FreeBSD10 in VBox running in Windows 7.
Rather than the other way around. I think I mentioned that earlier


I just created a short post about the NAT Network issue, here:
https://forums.virtualbox.org/viewtopic.php?f=6&t=60992


On Tue, Apr 1, 2014 at 11:58 AM, Jason Belec wrote:

> I run over 30 instances of Virtualbox with various OSs without issue all
> running ontop of ZFS environments. Most of my clients have at least 3 VMs
> running a variant of Windows ontop of ZFS without any issues. Not sure what
> you mean with your NAT issue. Perhaps posting your setup info might be of
> more help.
>
>
>
> --
> Jason Belec
> Sent from my iPad
>
> On Apr 1, 2014, at 11:34 AM, Eric Jaw  wrote:
>
>
>
> On Tuesday, April 1, 2014 7:04:39 AM UTC-4, jasonbelec wrote:
>>
>> ZFS is lots of parts, in most cases lots of cheap unreliable parts,
>> refurbished parts, yadda yadda, as posted on this thread and many, many
>> others, any issues are probably not ZFS but the parts of the whole. Yes, it
>> could be ZFS, after you confirm that all the parts ate pristine, maybe.
>>
>
> I don't think it's ZFS. ZFS is pretty solid. In my specific case, I'm
> trying to figure out why VirtualBox is creating these issues. I'm pretty
> sure that's the root cause, but I don't know why yet. So I'm just
> speculating at this point. Of course, I want to get my ZFS up and running
> so I can move on to what I really need to do, so it's easy to jump on a
> conclusion about something that I haven't thought of in my position. Hope
> you can understand
>
>
>>
>> My oldest system running ZFS is an Mac Mini Intel Core Duo with 3GB RAM
>> (not ECC) it is the home server for music, tv shows, movies, and some
>> interim backups. The mini has been modded for ESATA and has 6 drives
>> connected. The pool is 2 RaidZ of 3 mirrored with copies set at 2. Been
>> running since ZFS was released from Apple builds. Lost 3 drives, eventually
>> traced to a new cable that cracked at the connector which when hot enough
>> expanded lifting 2 pins free of their connector counter parts resulting in
>> errors. Visually almost impossible to see. I replaced port multipliers,
>> Esata cards, RAM, mini's, power supply, reinstalled OS, reinstalled ZFS,
>> restored ZFS data from backup, finally to find the bad connector end one
>> because it was hot and felt 'funny'.
>>
>> Frustrating, yes, educational also. The happy news is, all the data was
>> fine, wife would have torn me to shreds if photos were missing, music was
>> corrupt, etc., etc.. And this was on the old out of date but stable ZFS
>> version we Mac users have been hugging onto for dear life. YMMV
>>
>> Never had RAM as the issue, here in the mad science lab across 10
>> rotating systems or in any client location - pick your decade. However I
>> don't use cheap RAM either, and I only have 2 Systems requiring ECC
>> currently that don't even connect to ZFS as they are both XServers with
>> other lives.
>>
>>
>> --
>> Jason Belec
>> Sent from my iPad
>>
>> On Apr 1, 2014, at 12:13 AM, Daniel Becker  wrote:
>>
>> On Mar 31, 2014, at 7:41 PM, Eric Jaw  wrote:
>>
>> I started using ZFS about a few weeks ago, so a lot of it is still new to
>> me. I'm actually not completely certain about "proper procedure" for
>> repairing a pool. I'm not sure if I'm supposed to clear the errors after
>> the scrub, before or after (little things). I'm not sure if it even
>> matters. When I restarted the VM, the checksum counts cleared on its own.
>>
>>
>> The counts are not maintained across reboots.
>>
>>
>> On the first scrub it repaired roughly 1.65MB. None on the second scub.
>> Even after the scrub there were still 43 data errors. I was expecting they
>> were going to go away.
>>
>>
>> errors: 43 data errors, use '-v' for a list
>>
>>
>> What this means is that in these 43 cases, the system was not able to
>> correct the error (i.e., both drives in a mirror returned bad data).
>>
>>
>> This is an excellent question. They're in 'Normal' mode. I remember
>> looking in to this before and decided normal mode should be fine. I might
>> be wrong. So thanks for bringing this up. I'll have to check it out again.
>>
>>
>> The reason I was asking is that these symptoms would also be consistent
>> with something outside the VM writing to the disks behind the VM’s back;
>> that’s unlikely to happen accidentally with disk images, but raw disks are
>> visible to the host OS as such, so it may be as simple as Windows deciding
>> that it should initialize the “unformatted” (really, formatted with an
>> unknown filesystem) devices. Or it could be a raid controller that stores
>> its array metadata in the last sector of the array’s disks.
>>
>>
>> memtest86 and memtest86+ for 18 hours came out okay. I'm on my third
>> scrub and the number or errors has remained at 43. Checksum errors continue
>>

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-01 Thread Jason Belec

I run over 30 instances of Virtualbox with various OSs without issue all 
running ontop of ZFS environments. Most of my clients have at least 3 VMs 
running a variant of Windows ontop of ZFS without any issues. Not sure what you 
mean with your NAT issue. Perhaps posting your setup info might be of more help.


--
Jason Belec
Sent from my iPad

> On Apr 1, 2014, at 11:34 AM, Eric Jaw  wrote:
> 
> 
> 
>> On Tuesday, April 1, 2014 7:04:39 AM UTC-4, jasonbelec wrote:
>> ZFS is lots of parts, in most cases lots of cheap unreliable parts, 
>> refurbished parts, yadda yadda, as posted on this thread and many, many 
>> others, any issues are probably not ZFS but the parts of the whole. Yes, it 
>> could be ZFS, after you confirm that all the parts ate pristine, maybe. 
> 
> I don't think it's ZFS. ZFS is pretty solid. In my specific case, I'm trying 
> to figure out why VirtualBox is creating these issues. I'm pretty sure that's 
> the root cause, but I don't know why yet. So I'm just speculating at this 
> point. Of course, I want to get my ZFS up and running so I can move on to 
> what I really need to do, so it's easy to jump on a conclusion about 
> something that I haven't thought of in my position. Hope you can understand
>  
>> 
>> My oldest system running ZFS is an Mac Mini Intel Core Duo with 3GB RAM (not 
>> ECC) it is the home server for music, tv shows, movies, and some interim 
>> backups. The mini has been modded for ESATA and has 6 drives connected. The 
>> pool is 2 RaidZ of 3 mirrored with copies set at 2. Been running since ZFS 
>> was released from Apple builds. Lost 3 drives, eventually traced to a new 
>> cable that cracked at the connector which when hot enough expanded lifting 2 
>> pins free of their connector counter parts resulting in errors. Visually 
>> almost impossible to see. I replaced port multipliers, Esata cards, RAM, 
>> mini's, power supply, reinstalled OS, reinstalled ZFS, restored ZFS data 
>> from backup, finally to find the bad connector end one because it was hot 
>> and felt 'funny'. 
>> 
>> Frustrating, yes, educational also. The happy news is, all the data was 
>> fine, wife would have torn me to shreds if photos were missing, music was 
>> corrupt, etc., etc.. And this was on the old out of date but stable ZFS 
>> version we Mac users have been hugging onto for dear life. YMMV
>> 
>> Never had RAM as the issue, here in the mad science lab across 10 rotating 
>> systems or in any client location - pick your decade. However I don't use 
>> cheap RAM either, and I only have 2 Systems requiring ECC currently that 
>> don't even connect to ZFS as they are both XServers with other lives.
>> 
>> 
>> --
>> Jason Belec
>> Sent from my iPad
>> 
>>> On Apr 1, 2014, at 12:13 AM, Daniel Becker  wrote:
>>> 
 On Mar 31, 2014, at 7:41 PM, Eric Jaw  wrote:
 
 I started using ZFS about a few weeks ago, so a lot of it is still new to 
 me. I'm actually not completely certain about "proper procedure" for 
 repairing a pool. I'm not sure if I'm supposed to clear the errors after 
 the scrub, before or after (little things). I'm not sure if it even 
 matters. When I restarted the VM, the checksum counts cleared on its own.
>>> 
>>> The counts are not maintained across reboots.
>>> 
>>> 
 On the first scrub it repaired roughly 1.65MB. None on the second scub. 
 Even after the scrub there were still 43 data errors. I was expecting they 
 were going to go away.
 
> errors: 43 data errors, use '-v' for a list
>>> 
>>> What this means is that in these 43 cases, the system was not able to 
>>> correct the error (i.e., both drives in a mirror returned bad data).
>>> 
>>> 
 This is an excellent question. They're in 'Normal' mode. I remember 
 looking in to this before and decided normal mode should be fine. I might 
 be wrong. So thanks for bringing this up. I'll have to check it out again.
>>> 
>>> The reason I was asking is that these symptoms would also be consistent 
>>> with something outside the VM writing to the disks behind the VM’s back; 
>>> that’s unlikely to happen accidentally with disk images, but raw disks are 
>>> visible to the host OS as such, so it may be as simple as Windows deciding 
>>> that it should initialize the “unformatted” (really, formatted with an 
>>> unknown filesystem) devices. Or it could be a raid controller that stores 
>>> its array metadata in the last sector of the array’s disks.
>>> 
>>> 
 memtest86 and memtest86+ for 18 hours came out okay. I'm on my third scrub 
 and the number or errors has remained at 43. Checksum errors continue to 
 pile up as the pool is getting scrubbed.
 
 I'm just as flustered about this. Thanks again for the input.
>>> 
>>> Given that you’re seeing a fairly large number of errors in your scrubs, 
>>> the fact that memtest86 doesn’t find anything at all very strongly suggests 
>>> that this is not actually a memory issue.
> 
> -- 
> 
> --- 
> You recei

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-01 Thread Eric Jaw



On Tuesday, April 1, 2014 7:04:39 AM UTC-4, jasonbelec wrote:
>
> ZFS is lots of parts, in most cases lots of cheap unreliable parts, 
> refurbished parts, yadda yadda, as posted on this thread and many, many 
> others, any issues are probably not ZFS but the parts of the whole. Yes, it 
> could be ZFS, after you confirm that all the parts ate pristine, maybe. 
>

I don't think it's ZFS. ZFS is pretty solid. In my specific case, I'm 
trying to figure out why VirtualBox is creating these issues. I'm pretty 
sure that's the root cause, but I don't know why yet. So I'm just 
speculating at this point. Of course, I want to get my ZFS up and running 
so I can move on to what I really need to do, so it's easy to jump on a 
conclusion about something that I haven't thought of in my position. Hope 
you can understand
 

>
> My oldest system running ZFS is an Mac Mini Intel Core Duo with 3GB RAM 
> (not ECC) it is the home server for music, tv shows, movies, and some 
> interim backups. The mini has been modded for ESATA and has 6 drives 
> connected. The pool is 2 RaidZ of 3 mirrored with copies set at 2. Been 
> running since ZFS was released from Apple builds. Lost 3 drives, eventually 
> traced to a new cable that cracked at the connector which when hot enough 
> expanded lifting 2 pins free of their connector counter parts resulting in 
> errors. Visually almost impossible to see. I replaced port multipliers, 
> Esata cards, RAM, mini's, power supply, reinstalled OS, reinstalled ZFS, 
> restored ZFS data from backup, finally to find the bad connector end one 
> because it was hot and felt 'funny'. 
>
> Frustrating, yes, educational also. The happy news is, all the data was 
> fine, wife would have torn me to shreds if photos were missing, music was 
> corrupt, etc., etc.. And this was on the old out of date but stable ZFS 
> version we Mac users have been hugging onto for dear life. YMMV
>
> Never had RAM as the issue, here in the mad science lab across 10 rotating 
> systems or in any client location - pick your decade. However I don't use 
> cheap RAM either, and I only have 2 Systems requiring ECC currently that 
> don't even connect to ZFS as they are both XServers with other lives.
>
>
> --
> Jason Belec
> Sent from my iPad
>
> On Apr 1, 2014, at 12:13 AM, Daniel Becker > 
> wrote:
>
> On Mar 31, 2014, at 7:41 PM, Eric Jaw > 
> wrote:
>
> I started using ZFS about a few weeks ago, so a lot of it is still new to 
> me. I'm actually not completely certain about "proper procedure" for 
> repairing a pool. I'm not sure if I'm supposed to clear the errors after 
> the scrub, before or after (little things). I'm not sure if it even 
> matters. When I restarted the VM, the checksum counts cleared on its own.
>
>
> The counts are not maintained across reboots.
>
>
> On the first scrub it repaired roughly 1.65MB. None on the second scub. 
> Even after the scrub there were still 43 data errors. I was expecting they 
> were going to go away.
>
>
> errors: 43 data errors, use '-v' for a list
>
>
> What this means is that in these 43 cases, the system was not able to 
> correct the error (i.e., both drives in a mirror returned bad data).
>
>
> This is an excellent question. They're in 'Normal' mode. I remember 
> looking in to this before and decided normal mode should be fine. I might 
> be wrong. So thanks for bringing this up. I'll have to check it out again.
>
>
> The reason I was asking is that these symptoms would also be consistent 
> with something outside the VM writing to the disks behind the VM’s back; 
> that’s unlikely to happen accidentally with disk images, but raw disks are 
> visible to the host OS as such, so it may be as simple as Windows deciding 
> that it should initialize the “unformatted” (really, formatted with an 
> unknown filesystem) devices. Or it could be a raid controller that stores 
> its array metadata in the last sector of the array’s disks.
>
>
> memtest86 and memtest86+ for 18 hours came out okay. I'm on my third scrub 
> and the number or errors has remained at 43. Checksum errors continue to 
> pile up as the pool is getting scrubbed.
>
> I'm just as flustered about this. Thanks again for the input.
>
>
> Given that you’re seeing a fairly large number of errors in your scrubs, 
> the fact that memtest86 doesn’t find anything at all very strongly suggests 
> that this is not actually a memory issue.
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-01 Thread Eric Jaw



On Tuesday, April 1, 2014 12:13:30 AM UTC-4, Daniel Becker wrote:
>
> On Mar 31, 2014, at 7:41 PM, Eric Jaw > 
> wrote:
>
> I started using ZFS about a few weeks ago, so a lot of it is still new to 
> me. I'm actually not completely certain about "proper procedure" for 
> repairing a pool. I'm not sure if I'm supposed to clear the errors after 
> the scrub, before or after (little things). I'm not sure if it even 
> matters. When I restarted the VM, the checksum counts cleared on its own.
>
>
> The counts are not maintained across reboots.
>
>
> On the first scrub it repaired roughly 1.65MB. None on the second scub. 
> Even after the scrub there were still 43 data errors. I was expecting they 
> were going to go away.
>
>
> errors: 43 data errors, use '-v' for a list
>
>
> What this means is that in these 43 cases, the system was not able to 
> correct the error (i.e., both drives in a mirror returned bad data).
>
>
> This is an excellent question. They're in 'Normal' mode. I remember 
> looking in to this before and decided normal mode should be fine. I might 
> be wrong. So thanks for bringing this up. I'll have to check it out again.
>
>
> The reason I was asking is that these symptoms would also be consistent 
> with something outside the VM writing to the disks behind the VM’s back; 
> that’s unlikely to happen accidentally with disk images, but raw disks are 
> visible to the host OS as such, so it may be as simple as Windows deciding 
> that it should initialize the “unformatted” (really, formatted with an 
> unknown filesystem) devices. Or it could be a raid controller that stores 
> its array metadata in the last sector of the array’s disks.
>
> I read about this being a possible issue, so I created a partition for all 
the drives so Windows see's it as a drive with a partition, rather than 
unformatted. No raid controller for this setup
 

>
> memtest86 and memtest86+ for 18 hours came out okay. I'm on my third scrub 
> and the number or errors has remained at 43. Checksum errors continue to 
> pile up as the pool is getting scrubbed.
>
> I'm just as flustered about this. Thanks again for the input.
>
>
> Given that you’re seeing a fairly large number of errors in your scrubs, 
> the fact that memtest86 doesn’t find anything at all very strongly suggests 
> that this is not actually a memory issue.
>


It very well likely may not be a memory issue. The tricky part of this 
setup is, it's running through a VM with, what-should-be, direct access to 
the raw drives. It could be a driver, perhaps that doesn't want to play 
nice.

I've discovered that with a NAT network, port forwarding does not work 
properly, so I'm not discarding possible issues with VirtualBox

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-04-01 Thread Jason Belec

ZFS is lots of parts, in most cases lots of cheap unreliable parts, refurbished 
parts, yadda yadda, as posted on this thread and many, many others, any issues 
are probably not ZFS but the parts of the whole. Yes, it could be ZFS, after 
you confirm that all the parts ate pristine, maybe. 

My oldest system running ZFS is an Mac Mini Intel Core Duo with 3GB RAM (not 
ECC) it is the home server for music, tv shows, movies, and some interim 
backups. The mini has been modded for ESATA and has 6 drives connected. The 
pool is 2 RaidZ of 3 mirrored with copies set at 2. Been running since ZFS was 
released from Apple builds. Lost 3 drives, eventually traced to a new cable 
that cracked at the connector which when hot enough expanded lifting 2 pins 
free of their connector counter parts resulting in errors. Visually almost 
impossible to see. I replaced port multipliers, Esata cards, RAM, mini's, power 
supply, reinstalled OS, reinstalled ZFS, restored ZFS data from backup, finally 
to find the bad connector end one because it was hot and felt 'funny'. 

Frustrating, yes, educational also. The happy news is, all the data was fine, 
wife would have torn me to shreds if photos were missing, music was corrupt, 
etc., etc.. And this was on the old out of date but stable ZFS version we Mac 
users have been hugging onto for dear life. YMMV

Never had RAM as the issue, here in the mad science lab across 10 rotating 
systems or in any client location - pick your decade. However I don't use cheap 
RAM either, and I only have 2 Systems requiring ECC currently that don't even 
connect to ZFS as they are both XServers with other lives.

--
Jason Belec
Sent from my iPad

> On Apr 1, 2014, at 12:13 AM, Daniel Becker  wrote:
> 
>> On Mar 31, 2014, at 7:41 PM, Eric Jaw  wrote:
>> 
>> I started using ZFS about a few weeks ago, so a lot of it is still new to 
>> me. I'm actually not completely certain about "proper procedure" for 
>> repairing a pool. I'm not sure if I'm supposed to clear the errors after the 
>> scrub, before or after (little things). I'm not sure if it even matters. 
>> When I restarted the VM, the checksum counts cleared on its own.
> 
> The counts are not maintained across reboots.
> 
> 
>> On the first scrub it repaired roughly 1.65MB. None on the second scub. Even 
>> after the scrub there were still 43 data errors. I was expecting they were 
>> going to go away.
>> 
>>> errors: 43 data errors, use '-v' for a list
> 
> What this means is that in these 43 cases, the system was not able to correct 
> the error (i.e., both drives in a mirror returned bad data).
> 
> 
>> This is an excellent question. They're in 'Normal' mode. I remember looking 
>> in to this before and decided normal mode should be fine. I might be wrong. 
>> So thanks for bringing this up. I'll have to check it out again.
> 
> The reason I was asking is that these symptoms would also be consistent with 
> something outside the VM writing to the disks behind the VM’s back; that’s 
> unlikely to happen accidentally with disk images, but raw disks are visible 
> to the host OS as such, so it may be as simple as Windows deciding that it 
> should initialize the “unformatted” (really, formatted with an unknown 
> filesystem) devices. Or it could be a raid controller that stores its array 
> metadata in the last sector of the array’s disks.
> 
> 
>> memtest86 and memtest86+ for 18 hours came out okay. I'm on my third scrub 
>> and the number or errors has remained at 43. Checksum errors continue to 
>> pile up as the pool is getting scrubbed.
>> 
>> I'm just as flustered about this. Thanks again for the input.
> 
> Given that you’re seeing a fairly large number of errors in your scrubs, the 
> fact that memtest86 doesn’t find anything at all very strongly suggests that 
> this is not actually a memory issue.
> 

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-03-31 Thread Daniel Becker

On Mar 31, 2014, at 7:41 PM, Eric Jaw  wrote:

> I started using ZFS about a few weeks ago, so a lot of it is still new to me. 
> I'm actually not completely certain about "proper procedure" for repairing a 
> pool. I'm not sure if I'm supposed to clear the errors after the scrub, 
> before or after (little things). I'm not sure if it even matters. When I 
> restarted the VM, the checksum counts cleared on its own.

The counts are not maintained across reboots.

> On the first scrub it repaired roughly 1.65MB. None on the second scub. Even 
> after the scrub there were still 43 data errors. I was expecting they were 
> going to go away.
> 
> errors: 43 data errors, use '-v' for a list

What this means is that in these 43 cases, the system was not able to correct 
the error (i.e., both drives in a mirror returned bad data).

> This is an excellent question. They're in 'Normal' mode. I remember looking 
> in to this before and decided normal mode should be fine. I might be wrong. 
> So thanks for bringing this up. I'll have to check it out again.

The reason I was asking is that these symptoms would also be consistent with 
something outside the VM writing to the disks behind the VM’s back; that’s 
unlikely to happen accidentally with disk images, but raw disks are visible to 
the host OS as such, so it may be as simple as Windows deciding that it should 
initialize the “unformatted” (really, formatted with an unknown filesystem) 
devices. Or it could be a raid controller that stores its array metadata in the 
last sector of the array’s disks.

> memtest86 and memtest86+ for 18 hours came out okay. I'm on my third scrub 
> and the number or errors has remained at 43. Checksum errors continue to pile 
> up as the pool is getting scrubbed.
> 
> I'm just as flustered about this. Thanks again for the input.

Given that you’re seeing a fairly large number of errors in your scrubs, the 
fact that memtest86 doesn’t find anything at all very strongly suggests that 
this is not actually a memory issue.

smime.p7s
Description: S/MIME cryptographic signature

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-03-02 Thread Chris Ridd


On 2 Mar 2014, at 09:16, Philip Robar  wrote:

> On Sat, Mar 1, 2014 at 5:07 PM, Jason Belec  
> wrote:
>> RAM/ECC RAM is like consumer drives vs pro drives in your system, recent 
>> long term studies have shown you don't get much more for the extra money.
>> 
> Do you have references to these studies? This directly conflicts with what 
> I've seen posted, with references, in other forums on the frequency of soft 
> memory errors, particularly on systems that run 24x7, and how ECC memory is 
> able to correct these random errors.

I don't have any reference to Jason's claims about ECC, but recently Backblaze 
published some stats on their experiences with a variety of drives. Jason might 
have been thinking about these:

http://blog.backblaze.com/2013/12/04/enterprise-drive-reliability/

They have lots more related articles on their blog that are well worth a read.

Chris

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-03-02 Thread Daniel Becker

On Mar 2, 2014, at 2:33 AM, Bjoern Kahl  wrote:

> On the other side you say (only) 8% of all DIMMs are affected per
> *year*.  I *guess* (and might be wrong) that the majority of installed
> DIMMs nowadays are 2 GB DIMMs, so you need four of them to build
> 8 GB.  Assuming equal distribution of bit errors, this means on
> average *every* DIMM will experience 1 bit error per hour.  That
> doesn't fit.

The disconnect is in the fact that they are not uniformly distributed at all; 
see my other email. Some (bad) DIMMs produce tons of errors, while the vast 
majority produce none at all. Quoting the averages is really kind of misleading.

smime.p7s
Description: S/MIME cryptographic signature

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-03-02 Thread Bjoern Kahl

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Am 02.03.14 09:45, schrieb Philip Robar:
> On Sun, Mar 2, 2014 at 12:46 AM, Jean-Yves Avenard
> wrote:
> 

[ cut a lot not relevant to my comment ]

>> Bad RAM however has nothing to do with the occasional bit flip
>> that would be prevented using ECC RAM. The probability of a bit
>> flip is low, very low.
>> 
> 
> You and Jason have both claimed this. This is at odds with papers
> and studies I've seen mentioned elsewhere. Here's what a little
> searching found:
> 
> Soft Error: https://en.wikipedia.org/wiki/Soft_error Which says
> that there are numerous sources of soft errors in memory and other
> circuits other than cosmic rays.
> 
> ECC Memory: https://en.wikipedia.org/wiki/ECC_memory States that
> design has dealt with the problem of increased circuit density. It
> then mentions the research IBM did years ago and Google's 2009
> report which says:
> 
> The actual error rate found was several orders of magnitude higher
> than previous small-scale or laboratory studies, with 25,000 to
> 70,000 errors per billion device hours per mega*bit* (about 2.5-7 ×
> 10-11 error/bit·h)(i.e. about 5 single bit errors in 8 Gigabytes of
> RAM per hour using the top-end error rate), and more than 8% of
> DIMM memory modules affected by errors per year.

 Have you some *reliable* source for your claim in above paragraph?

 You say that an average 8 GB memory subsystem should experience 5 bit
 errors per *hour* of operation.

 On the other side you say (only) 8% of all DIMMs are affected per
 *year*.  I *guess* (and might be wrong) that the majority of installed
 DIMMs nowadays are 2 GB DIMMs, so you need four of them to build
 8 GB.  Assuming equal distribution of bit errors, this means on
 average *every* DIMM will experience 1 bit error per hour.  That
 doesn't fit.

 Today's all purpose PC's regularly ship with 8 GB of RAM, and modern,
 widely used operating systems, no matter which vendor, all make
 excessive use of any single bit of memory they can get.  Non of these
 have any software means to protect RAM content, including FS caches,
 against bit rot.

 With 5 bit errors per hour these machines should be pretty unusable,
 corrupting documents all day and probably crashing applications and
 sometimes the OS repeatedly within a business day.  Yet I am not aware
 of any reports that daily office computing ceased to be reliably
 usable over the last decade.

 So something doesn't fit here.  Where is (my?) mistake in reasoning?

 Of course, this does not say anything about ZFS' vulnerability to RAM
 errors compared to other system parts.  I'll come to that point in a
 different mail, but it will take a bit more time to write it up
 without spreading more uncertainty than already produced in this
 thread.

 Best regards

Björn

- -- 
| Bjoern Kahl   +++   Siegburg   +++Germany |
| "googlelogin@-my-domain-"   +++   www.bjoern-kahl.de  |
| Languages: German, English, Ancient Latin (a bit :-)) |
-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQCVAgUBUxMJAFsDv2ib9OLFAQKIyAQAmZBIryCnndv1FZleZ5JRlQpVMZ8N+TmB
3FYBMTFk9c8caC65Avv9cKsP7Fq5X2F3gRfTzo8f8Kk9evsnOGheksFPs8y14gsP
AYTXz8B0rbZlfH/DQhV5JOYnEdeYXTwuN3Nso41CMER7EFpa6bEGSNiTiA8inbCr
GjHQot2gTwc=
=8fU0
-END PGP SIGNATURE-

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-03-02 Thread Daniel Becker

On Mar 2, 2014, at 12:45 AM, Philip Robar  wrote:

> But if you insist: from "Oracle Solaris 11.1 Administration: ZFS File 
> Systems", "Consider using ECC memory to protect against memory corruption. 
> Silent memory corruption can potentially damage your data." [1]

That is in no way specific to ZFS, though; silent memory corruption can cause 
corruption in any number of ways for basically any filesystem. If you value 
your data, you'll want to use ECC, regardless of whether you use ZFS or not.

> The actual error rate found was several orders of magnitude higher than 
> previous small-scale or laboratory studies, with 25,000 to 70,000 errors per 
> billion device hours per megabit (about 2.5–7 × 10−11 error/bit·h)(i.e. about 
> 5 single bit errors in 8 Gigabytes of RAM per hour using the top-end error 
> rate), and more than 8% of DIMM memory modules affected by errors per year.
> 
> So, since you've agreed that ZFS is more vulnerable than other file systems 
> to memory errors, and Google says that these errors are a lot more frequent 
> than most people think that they are then the question becomes just how much 
> more vulnerable is ZFS and is the extent of the corruption likely to be wider 
> or more catastrophic than on other file systems?

It's somewhat misleading to just look at the averages in this case, though, as 
the paper specifically points out that the errors are in fact highly clustered, 
not evenly distributed across devices and/or time. I.e., there are some DIMMs 
that produce a very large number of errors, but the vast majority of DIMMs (92% 
as per the paragraph you quoted above) actually produce no (detectable) bit 
errors at all per year.

> It seems to me that if using ZFS without ECC memory puts someone's data at an 
> increased risk over other file system then they ought to be told that so that 
> they can make an informed decision. Am I really being unreasonable about this?

You keep claiming this, but I still haven't seen any conclusive evidence that 
lack of ECC poses a higher overall risk for your data when using ZFS than with 
other file systems. Note that even if you could find a scenario where ZFS will 
do worse than others (and I maintain that the specific scenario Cyberjock 
describes is not actually plausible), there are other scenarios where ZFS will 
actually catch memory corruption but other file systems will not (e.g., bit 
flip occurs after checksum has been computed but before data is written to 
disk, or bit flip occurs after data has been read from disk but before checksum 
is compared, or bit flip causes stray write of bogus data to disk); without 
knowing the likelihood of each of these scenarios and their respective damage 
potential, it is impossible to say which side is more at risk.

smime.p7s
Description: S/MIME cryptographic signature

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-03-02 Thread Philip Robar

On Sat, Mar 1, 2014 at 5:07 PM, Jason Belec wrote:

> Technically, what you qualify below is a truism under any hardware. ZFS is
> neither more or less susceptible to RAM failure as it has nothing to do
> with ZFS. Anything that gets written to the pool technically is sound. You
> have chosen a single possible point of failure, what of firmware, drive
> cache, motherboard, power surges, motion, etc.?
>

I'm sorry, but I'm not following your logic here. Are you saying that ZFS
doesn't use RAM so it can't be affected by it? ZFS likes lots of memory and
uses it aggressively. So my understanding is that large amounts of data are
more likely to be in memory with ZFS than with other file systems. If
Google's research is to believed then random memory errors are a lot more
frequent than you think that they are. As I understand it, ZFS does not
checksum data while it's in memory. (While there a debug flag to turn this
on, I'm betting that the performance hit is pretty big.) So how does RAM
failure or random bit flips have nothing to do with ZFS?

>
> RAM/ECC RAM is like consumer drives vs pro drives in your system, recent
> long term studies have shown you don't get much more for the extra money.
>

Do you have references to these studies? This directly conflicts with what
I've seen posted, with references, in other forums on the frequency of soft
memory errors, particularly on systems that run 24x7, and how ECC memory is
able to correct these random errors.

> I have been running ZFS in production using the past and current versions
> for OS X on over 60 systems (12 are servers) since Apple kicked ZFS loose.
> No systems (3 run ECC) have had data corruption or data loss.
>

That you know of.

> Some pools have disappeared on the older ZFS but were easily recovered on
> modern (current development) and past OpenSolaris, FreeBSD, etc., as I keep
> clones of 'corrupted' pools for such tests. Almost always, these were the
> result of connector/cable failure. In that span of time no RAM has failed
> 'utterly' and all data and tests have shown quality storage. In that time
> 11 drives have failed and easily been replaced, 4 of those were OS drives,
> data stored under ZFS and a regular clone of the OS also stored under ZFS
> just in case. All pools are backed-up/replicated off site. Probably a lot
> more than most are doing for data integrity.
>
> No this data I'm providing is not a guarantee. It's just data from someone
> who has grown to trust ZFS in the real world for clients that cannot lose
> data for the most part due to legal regulations. I trust RAM manufacturers
> and drive manufacturers equally, I just verify for peace of mind with ZFS.
>

 I have an opinion of people who run servers with legal or critical
business data on it that do not use ECC memory but I'll keep it to myself.

Phil

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-03-02 Thread Philip Robar

On Sun, Mar 2, 2014 at 12:46 AM, Jean-Yves Avenard wrote:

> On 28 February 2014 20:32, Philip Robar  wrote:
>
> cyberjock is the biggest troll ever, not even the people actually
> involved with FreeNAS (iX system) knows what to do with him. He does
> spend an awful amount of time on the freenas forums helping others and
> as such tolerate him on that basis..
>
> Otherwise, he just someone doing nothing, with a lot of time on his
> hand and spewing the same stuff over and over simply because he has
> heard about it.
>

Well, that's at odds with his claims of how much time and effort he has put
into learning about ZFS and is basically an ad hominem attack, but since
Daniel Becker has already cast a far amount of doubt on both the scenario
and logic behind cyberdog's EEC vs non-ECC posts and his understanding of
architecture of ZFS I'll move on.

> Back to the ECC topic; one core issue to ZFS is that it will
> specifically write to the pool even when all you are doing is read, in
> an attempt to correct any data found to have incorrect checksum.
> So say you have corrupted memory, you read from the disk, zfs believes
> the data is faulty (after all, the checksum will be incorrect due to
> faulty RAM) and start to rewrite the data. That is one scenario where
> ZFS will corrupt an otherwise healthy pool until its too late and all
> your data is gone.
> As such, ZFS is indeed more sensitive to bad RAM than other filesystem.

So, you're agreeing with cyberdog's conclusion, just not the path he took
to get there.

> Having said that; find me *ONE* official source other than the FreeNAS

forum stating that ECC is a minimal requirements (and no a wiki
> written by cyberjock doesn't count). Solaris never said so, FreeBSD
> didn't either, nor Sun.
>

So if a problem isn't documented, it's not a problem?

Most Sun/Solaris documentation isn't going to mention the need for ECC
memory because all Sun systems shipped with ECC memory.
FreeBSD/PC-BSD/FreeNAS/NAS4Free/Linux in turn derive from worlds where ECC
memory is effectively nonexistent so their lack of documentation may stem
from a combination of the ZFS folks just assuming that you have it and the
distro people not realizing that you need it. FreeNAS's guide does state
pretty strongly that you should use ECC memory. But if you insist: from
"Oracle Solaris 11.1 Administration: ZFS File Systems", "Consider using ECC
memory to protect against memory corruption. Silent memory corruption can
potentially damage your data." [1]

It seems to me that if using ZFS without ECC memory puts someone's data at
an increased risk over other file system then they ought to be told that so
that they can make an informed decision. Am I really being unreasonable
about this?

> Bad RAM however has nothing to do with the occasional bit flip that
> would be prevented using ECC RAM. The probability of a bit flip is
> low, very low.
>

You and Jason have both claimed this. This is at odds with papers and
studies I've seen mentioned elsewhere. Here's what a little searching found:

Soft Error: https://en.wikipedia.org/wiki/Soft_error
Which says that there are numerous sources of soft errors in memory and
other circuits other than cosmic rays.

ECC Memory: https://en.wikipedia.org/wiki/ECC_memory
States that design has dealt with the problem of increased circuit density.
It then mentions the research IBM did years ago and Google's 2009 report
which says:

The actual error rate found was several orders of magnitude higher than
previous small-scale or laboratory studies, with 25,000 to 70,000 errors
per billion device hours per mega*bit* (about 2.5-7 × 10-11 error/bit·h)(i.e.
about 5 single bit errors in 8 Gigabytes of RAM per hour using the top-end
error rate), and more than 8% of DIMM memory modules affected by errors per
year.

So, since you've agreed that ZFS is more vulnerable than other file systems
to memory errors, and Google says that these errors are a lot more frequent
than most people think that they are then the question becomes just how
much more vulnerable is ZFS and is the extent of the corruption likely to
be wider or more catastrophic than on other file systems?

 Phil

[1] Oracle Solaris 11.1 Administration: ZFS File Systems:
http://docs.oracle.com/cd/E26502_01/html/E29007/zfspools-4.html

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-03-01 Thread Philip Robar

On Sun, Mar 2, 2014 at 12:46 AM, Jean-Yves Avenard wrote:

>
> Back to the OP, I'm not sure why he felt he had to mentioned being
> part of SunOS. ZFS was never part of sunos.
>

I didn't say I was part of SunOS (later renamed to Solaris 1). SunOS was
dead and buried years before I joined the network side of OS/Net. "OS" in
this case just means operating system, it's not a reference to the "OS" in
SunOS.

By mentioning that I worked in the part of Sun that invented ZFS and saying
that I am a fan of it I was just trying to be clear that I was not
attacking ZFS by questioning some aspect of it. Clearly, at least in some
minds I failed at that.

Phil

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-03-01 Thread Jean-Yves Avenard

On 28 February 2014 20:32, Philip Robar  wrote:

> This is a ZFS issue because ZFS is advertised as being the most resilient
> file system currently available; however, a community leader in the FreeNAS
> forums (though, as pointed out by Daniel Becker, one without knowledge of
> ZFS internals) has argued repeatedly and strongly, and in detail that this
> robustness is severely compromised by using ZFS without ECC memory. Further

cyberjock is the biggest troll ever, not even the people actually
involved with FreeNAS (iX system) knows what to do with him. He does
spend an awful amount of time on the freenas forums helping others and
as such tolerate him on that basis..

Otherwise, he just someone doing nothing, with a lot of time on his
hand and spewing the same stuff over and over simply because he has
heard about it.

Back to the ECC topic; one core issue to ZFS is that it will
specifically write to the pool even when all you are doing is read, in
an attempt to correct any data found to have incorrect checksum.
So say you have corrupted memory, you read from the disk, zfs believes
the data is faulty (after all, the checksum will be incorrect due to
faulty RAM) and start to rewrite the data. That is one scenario where
ZFS will corrupt an otherwise healthy pool until its too late and all
your data is gone.
As such, ZFS is indeed more sensitive to bad RAM than other filesystem.

Having said that; find me *ONE* official source other than the FreeNAS
forum stating that ECC is a minimal requirements (and no a wiki
written by cyberjock doesn't count). Solaris never said so, FreeBSD
didn't either, nor Sun.

Bad RAM however has nothing to do with the occasional bit flip that
would be prevented using ECC RAM. The probability of a bit flip is
low, very low.

Back to the OP, I'm not sure why he felt he had to mentioned being
part of SunOS. ZFS was never part of sunos.

JY

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-03-01 Thread Jason Belec

Technically, what you qualify below is a truism under any hardware. ZFS is 
neither more or less susceptible to RAM failure as it has nothing to do with 
ZFS. Anything that gets written to the pool technically is sound. You have 
chosen a single possible point of failure, what of firmware, drive cache, 
motherboard, power surges, motion, etc.?

RAM/ECC RAM is like consumer drives vs pro drives in your system, recent long 
term studies have shown you don't get much more for the extra money.

I have been running ZFS in production using the past and current versions for 
OS X on over 60 systems (12 are servers) since Apple kicked ZFS loose. No 
systems (3 run ECC) have had data corruption or data loss. Some pools have 
disappeared on the older ZFS but were easily recovered on modern (current 
development) and past OpenSolaris, FreeBSD, etc., as I keep clones of 
'corrupted' pools for such tests. Almost always, these were the result of 
connector/cable failure. In that span of time no RAM has failed 'utterly' and 
all data and tests have shown quality storage. In that time 11 drives have 
failed and easily been replaced, 4 of those were OS drives, data stored under 
ZFS and a regular clone of the OS also stored under ZFS just in case. All pools 
are backed-up/replicated off site. Probably a lot more than most are doing for 
data integrity.

No this data I'm providing is not a guarantee. It's just data from someone who 
has grown to trust ZFS in the real world for clients that cannot lose data for 
the most part due to legal regulations. I trust RAM manufacturers and drive 
manufacturers equally, I just verify for peace of mind with ZFS. 

--
Jason Belec
Sent from my iPad

> On Mar 1, 2014, at 5:39 PM, Philip Robar  wrote:
> 
>> On Fri, Feb 28, 2014 at 2:36 PM, Richard Elling  
>> wrote:
>> 
>> We might buy this argument if, in fact, no other program had the same
>> vulnerabilities. But *all* of them do -- including OSX. So it is disingenuous
>> to claim this as a ZFS deficiency.
> 
> No it's disingenuous of you to ignore the fact that I carefully qualified 
> what I said. To repeat, it's claimed with a detailed example and reasoned 
> argument that ZFS is MORE vulnerable to corruption due to memory errors when 
> using non-ECC memory and that that corruption is MORE likely to be extensive 
> or catastrophic than with other file systems.
> 
> As I said, Jason's and Daniel Becker's responses are reassuring, but I'd 
> really like a definitive answer to this so I've reached out to one of the 
> lead Open ZFS developers. Hopefully, I'll hear back from him.
> 
> Phil
> 
> 
> -- 
>  
> --- 
> You received this message because you are subscribed to the Google Groups 
> "zfs-macos" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to zfs-macos+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-03-01 Thread Philip Robar

On Fri, Feb 28, 2014 at 2:36 PM, Richard Elling wrote:

>
> We might buy this argument if, in fact, no other program had the same
> vulnerabilities. But *all* of them do -- including OSX. So it is
> disingenuous
> to claim this as a ZFS deficiency.
>

No it's disingenuous of you to ignore the fact that I carefully qualified
what I said. To repeat, it's claimed with a detailed example and reasoned
argument that ZFS is *MORE* vulnerable to corruption due to memory errors
when using non-ECC memory and that that corruption is *MORE* likely to be
extensive or catastrophic than with other file systems.

As I said, Jason's and Daniel Becker's responses are reassuring, but I'd
really like a definitive answer to this so I've reached out to one of the
lead Open ZFS developers. Hopefully, I'll hear back from him.

Phil

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-02-28 Thread Richard Elling


On Feb 28, 2014, at 1:32 AM, Philip Robar  wrote:

> On Thu, Feb 27, 2014 at 11:42 AM, Bill Winnett  wrote:
> 
> Why is this a zfs issue.
> 
> This is a ZFS issue because ZFS is advertised as being the most resilient 
> file system currently available; however, a community leader in the FreeNAS 
> forums (though, as pointed out by Daniel Becker, one without knowledge of ZFS 
> internals) has argued repeatedly and strongly, and in detail that this 
> robustness is severely compromised by using ZFS without ECC memory. Further 
> he argues that ZFS without ECC memory is more vulnerable than other file 
> systems to data corruption and that this corruption is likely to silently 
> cause complete and unrecoverable pool failure. This in turn, if true, is an 
> issue because ZFS is increasing being used on systems that either are not 
> using or can not use ECC memory.

We might buy this argument if, in fact, no other program had the same 
vulnerabilities. But *all* of them do -- including OSX. So it is disingenuous
to claim this as a ZFS deficiency.
 -- richard

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-02-28 Thread Philip Robar

On Thu, Feb 27, 2014 at 11:42 AM, Bill Winnett wrote:

>
> Why is this a zfs issue.

This is a ZFS issue because ZFS is advertised as being the most resilient
file system currently available; however, a community leader in the FreeNAS
forums (though, as pointed out by Daniel Becker, one without knowledge of
ZFS internals) has argued repeatedly and strongly, and in detail that this
robustness is severely compromised by using ZFS without ECC memory. Further
he argues that ZFS without ECC memory is more vulnerable than other file
systems to data corruption and that this corruption is likely to silently
cause complete and unrecoverable pool failure. This in turn, if true, is an
issue because ZFS is increasing being used on systems that either are not
using or can not use ECC memory.

Jason Belec has asserted that his testing shows that there is not an
increased chance of partial or complete loss of data when not using ECC
memory.

Daniel Becker has cogently argued that the scenario and logic in the
warning is incorrect and that it has not been shown that ZFS is in fact
more vulnerable than other file systems when ECC memory is not used.

While I am reassured by their responses, I would still like an
authoritative and preferably definitive answer as to whether or not ZFS is
in fact any more or less vulnerable than other file systems when ECC memory
is not used. So I'm going to ask my question on the Open ZFS developer's
list. (Since that's the only list they have.)

Phil

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-02-27 Thread Daniel Jozsef

Wouldn't ZFS actually make things better as opposed to worse?

Say I have a Macbook with failing memory, and there's a magnetic storm. If 
I was using HFS+, with each write I'd be seeding the drive with bit errors, 
without ever noticing until the system crashes. If the bit error happens 
infrequently, the data corruption would likely be propagated to any backup 
I maintain.

With ZFS, the bit error would likely result in me being alerted of a 
corruption, and even if error correction "fixed" the data on the drive, 
this would result in an inconsistent state, and soon ZFS would take the 
drive offline due to fault threshold, and the system would crash. Then I 
would know that the data is damaged, and I could restore from backup after 
replacing the memory.

(And of course, ZFS protects me from the much more common hard drive bit 
errors.)

I don't see how this isn't awesome.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-02-27 Thread Bill Winnett





Why is this a zfs issue.  If you have bad ram data, the OS is  
compromised already.


Where is the blame supposed to be.  ZFS is not an OS and is has to  
trust some api's it calls to actually perform work.


--

--- 
You received this message because you are subscribed to the Google Groups "zfs-macos" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-02-27 Thread Richard Elling


On Feb 26, 2014, at 10:51 PM, Daniel Becker  wrote:

> Incidentally, that paper came up in a ZFS-related thread on Ars Technica just 
> the other day (as did the link to the FreeNAS forum post). Let me just quote 
> what I said there:
> 
>> The conclusion of the paper is that ZFS does not protect against in-memory 
>> corruption, and thus can't provide end-to-end integrity in the presence of 
>> memory errors. I am not arguing against that at all; obviously you'll want 
>> ECC on your ZFS-based server if you value data integrity -- just as you 
>> would if you were using any other file system. That doesn't really have 
>> anything to do with the claim that ZFS specifically makes lack of ECC more 
>> likely to cause total data loss, though.
> 
> The sections you quote below basically say that while ZFS offers good 
> protection against on-disk corruption, it does *not* effectively protect you 
> against memory errors. Or, put another way, the authors are basically finding 
> that despite all the FS-level checksumming, ZFS does not render ECC memory 
> unnecessary (as one might perhaps naively expect). No claim is being made 
> that memory errors affect ZFS more than other filesystems.

Yes. Just like anything else, end-to-end data integrity is needed. So until 
people write apps that self-check everything, there is a possibility that
something you trust [1] can fail. As it happens, only the PC market demands
no ECC. TANSTAAFL.

[1] http://en.wikipedia.org/wiki/Pentium_FDIV_bug
 -- richard

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-02-26 Thread Daniel Becker

Incidentally, that paper came up in a ZFS-related thread on Ars Technica just 
the other day (as did the link to the FreeNAS forum post). Let me just quote 
what I said there:

> The conclusion of the paper is that ZFS does not protect against in-memory 
> corruption, and thus can't provide end-to-end integrity in the presence of 
> memory errors. I am not arguing against that at all; obviously you'll want 
> ECC on your ZFS-based server if you value data integrity -- just as you would 
> if you were using any other file system. That doesn't really have anything to 
> do with the claim that ZFS specifically makes lack of ECC more likely to 
> cause total data loss, though.

The sections you quote below basically say that while ZFS offers good 
protection against on-disk corruption, it does *not* effectively protect you 
against memory errors. Or, put another way, the authors are basically finding 
that despite all the FS-level checksumming, ZFS does not render ECC memory 
unnecessary (as one might perhaps naively expect). No claim is being made that 
memory errors affect ZFS more than other filesystems.


On Feb 26, 2014, at 10:24 PM, Philip Robar  wrote:

> Thank you for your reasoned and detailed response and subsequent followup. 
> This was exactly what I was hoping for.
> 
> I'm curious, have you read, End-to-end Data Integrity for File Systems: A ZFS 
> Case Study by Zhang, et al? 
> 
> Abstract: present a study of the effects of disk and memory corruption on 
> file system data integrity. Our analysis focuses on Sun’s ZFS, a modern 
> commercial offering with numerous reliability mechanisms. Through careful and 
> thorough fault injection, we show that ZFS is robust to a wide range of disk 
> faults. We further demonstrate that ZFS is less resilient to memory 
> corruption, which can lead to corrupt data being returned to applications or 
> system crashes. Our analysis reveals the importance of considering both 
> memory and disk in the construction of truly robust file and storage systems.
> 
> ...memory corruptions still remain a serious problem to data integrity. Our 
> results for memory corruptions indicate cases where bad data is returned to 
> the user, operations silently fail, and the whole system crashes. Our 
> probability analysis shows that one single bit flip has small but 
> non-negligible chances to cause failures such as reading/writing corrupt data 
> and system crashing.
> 
> Phil
> 
> 
> -- 
>  
> --- 
> You received this message because you are subscribed to the Google Groups 
> "zfs-macos" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to zfs-macos+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.



smime.p7s
Description: S/MIME cryptographic signature

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-02-26 Thread Philip Robar

Thank you for your reasoned and detailed response and subsequent followup.
This was exactly what I was hoping for.

I'm curious, have you read, *End-to-end Data Integrity for File Systems: A
ZFS Case Study
*
by
Zhang, et al?

Abstract: present a study of the effects of disk and memory corruption on
> file system data integrity. Our analysis focuses on Sun's ZFS, a modern
> commercial offering with numerous reliability mechanisms. Through careful
> and thorough fault injection, we show that ZFS is robust to a wide range of
> disk faults. We further demonstrate that ZFS is less resilient to memory
> corruption, which can lead to corrupt data being returned to applications
> or system crashes. Our analysis reveals the importance of considering both
> memory and disk in the construction of truly robust file and storage
> systems.


...memory corruptions still remain a serious problem to data integrity. Our
> results for memory corruptions indicate cases where bad data is returned to
> the user, operations silently fail, and the whole system crashes. Our
> probability analysis shows that one single bit flip has small but
> non-negligible chances to cause failures such as reading/writing corrupt
> data and system crashing.


Phil

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-02-26 Thread Philip Robar

On Wed, Feb 26, 2014 at 9:04 PM, Jason Belec wrote:

> Well that's one point of view and choice.
>
> I'm sure those you refer to are far more knowledgeable than any other
> individuals.
>

It's unclear to me how what I said (or possibly how I said it) caused you
to reply in such a snarky way. I thought that my question was posed in a
reasonable tone and based on a reasoned argument by a specific ZFS related
community leader.

I, and certainly not cyberjock, made no claims that either he is "far more
knowledgeable than any other individuals." If I recall correctly, cyberjock
has specifically said that he has not looked at ZFS's source code. Rather,
what he has said is that his recommendations are based on "a month of 12
hour days reading forums, experimenting with a VM, and then later a test
platform." (And, presumably, his couple of years since then actively using
ZFS.)

> I can only speak for myself. I have intentionally attempted to destroy
> data for years under ZFS, amazingly enough all data is always recoverable.
> I have intentionally stayed away from protected RAM to ensure data for
> clients is safe.
>

Great! Have you made your procedures public and repeatable so that others
can replicate and verify them or use them for future testing?

> So back to trolling. Let's be honest, if you were not trolling you would
> have started a new thread for people to discuss your views.
>

I did start a new thread. Why are you lying about me not having done so?

Phil

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-02-26 Thread Daniel Becker

Actually, thinking about this some more, the real reason that this hypothetical 
horror scenario cannot actually happen in real life is that the checksum would 
never get recomputed from the improperly “corrected” data to begin with: The 
checksum for a given block is stored in its *parent* block (which itself has a 
checksum that is stored in its parent, and so on and so forth, all the way up 
to the uberblock), not in the block itself. Therefore, if a checksum failure is 
detected for a block, only the block itself will be corrected (and possibly 
corrupted as a result of a memory error), not its checksum (which is protected 
by the parent block’s checksum).

See e.g. the following website for more explanation of how things are organized 
internally: 


On Feb 26, 2014, at 7:09 PM, Daniel Becker  wrote:

> A few things to think about when reading that forum post:
> 
> - The scenario described in that post is based on the assumption that all 
> blocks read from disk somehow get funneled through a single memory location, 
> which also happens to have a permanent fault.
> - In addition, it assumes that after a checksum failure, the corrected data 
> either gets stored in the exact same memory location again, or in another 
> memory location that also has a permanent fault.
> - It also completely ignores the fact that ZFS has an internal error 
> threshold and will automatically offline a device once the number of 
> read/checksum errors seen on it exceeds that threshold, preventing further 
> corruption. ZFS will *not* go and happily mess up your entire pool.
> - This would *not* be silent; ZFS would report a large number of checksum 
> errors on all your devices.
> - Blocks corrupted in that particular way would *not* actually spread to 
> incremental backups or via rsync, as the corrupted blocks would not be seen 
> as modified.
> - There is no indication that the reported cases of data loss that he points 
> to are actually due to the particular failure mechanism described in the 
> post; there are *lots* of other ways in which memory corruption can lead to a 
> file system becoming unmountable, checksums or not.
> - Last but not leasts, note that “Cyberjock" is a community moderator, not 
> somebody who’s actually in any way involved in the development of ZFS (or 
> even FreeNAS; see the preface of his FreeNAS guide for some info on his 
> background). If this were really as big of a risk as he thinks it is, you’d 
> think somebody who is actually familiar with the internals of ZFS would have 
> raised this concern before.
> 
> 
> 
> On Feb 26, 2014, at 5:56 PM, Philip Robar  wrote:
> 
>> Please note, I'm not trolling with this message. I worked in Sun's OS/Net 
>> group and am a huge fan of ZFS.
>> 
>> The leading members of the FreeNAS community make it clear [1] (with a 
>> detailed explanation and links to reports of data loss) that if you use ZFS 
>> without ECC RAM that there is a very good chance that you will eventually 
>> experience a total loss of your data without any hope of recovery. [2] 
>> (Unless you have literally thousands of dollars to spend on that recovery. 
>> And even then there's no guarantee of said recovery.) The features of ZFS, 
>> checksumming and scrubbing, work together to silently spread the damage done 
>> by cosmic rays and/or bad memory throughout a file system and this 
>> corruption then spreads to your backups.
>> 
>> Given this, aren't the various ZFS communities--particularly those that are 
>> small machine oriented [3]--other than FreeNAS (and even they don't say it 
>> as strongly enough in their docs), doing users a great disservice by 
>> implicitly encouraging them to use ZFS w/o ECC RAM or on machines that can't 
>> use ECC RAM?
>> 
>> As an indication of how persuaded I've been for the need of ECC RAM, I've 
>> shut down my personal server and am not going to access that data until I've 
>> built a new machine with ECC RAM.
>> 
>> Phil
>> 
>> [1] ECC vs non-ECC RAM and ZFS: 
>> http://forums.freenas.org/index.php?threads/ecc-vs-non-ecc-ram-and-zfs.15449/
>> 
>> [2] cyberjock: "So when you read about how using ZFS is an "all or none" I'm 
>> not just making this up. I'm really serious as it really does work that way. 
>> ZFS either works great or doesn't work at all. That really truthfully [is] 
>> how it works."
>> 
>> [3] ZFS-macos, NAS4Free, PC-BSD, ZFS on Linux
>> 
>> 
>> -- 
>>  
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "zfs-macos" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to zfs-macos+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
> 



smime.p7s
Description: S/MIME cryptographic signature

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-02-26 Thread Daniel Becker

A few things to think about when reading that forum post:

- The scenario described in that post is based on the assumption that all 
blocks read from disk somehow get funneled through a single memory location, 
which also happens to have a permanent fault.
- In addition, it assumes that after a checksum failure, the corrected data 
either gets stored in the exact same memory location again, or in another 
memory location that also has a permanent fault.
- It also completely ignores the fact that ZFS has an internal error threshold 
and will automatically offline a device once the number of read/checksum errors 
seen on it exceeds that threshold, preventing further corruption. ZFS will 
*not* go and happily mess up your entire pool.
- This would *not* be silent; ZFS would report a large number of checksum 
errors on all your devices.
- Blocks corrupted in that particular way would *not* actually spread to 
incremental backups or via rsync, as the corrupted blocks would not be seen as 
modified.
- There is no indication that the reported cases of data loss that he points to 
are actually due to the particular failure mechanism described in the post; 
there are *lots* of other ways in which memory corruption can lead to a file 
system becoming unmountable, checksums or not.
- Last but not leasts, note that “Cyberjock" is a community moderator, not 
somebody who’s actually in any way involved in the development of ZFS (or even 
FreeNAS; see the preface of his FreeNAS guide for some info on his background). 
If this were really as big of a risk as he thinks it is, you’d think somebody 
who is actually familiar with the internals of ZFS would have raised this 
concern before.



On Feb 26, 2014, at 5:56 PM, Philip Robar  wrote:

> Please note, I'm not trolling with this message. I worked in Sun's OS/Net 
> group and am a huge fan of ZFS.
> 
> The leading members of the FreeNAS community make it clear [1] (with a 
> detailed explanation and links to reports of data loss) that if you use ZFS 
> without ECC RAM that there is a very good chance that you will eventually 
> experience a total loss of your data without any hope of recovery. [2] 
> (Unless you have literally thousands of dollars to spend on that recovery. 
> And even then there's no guarantee of said recovery.) The features of ZFS, 
> checksumming and scrubbing, work together to silently spread the damage done 
> by cosmic rays and/or bad memory throughout a file system and this corruption 
> then spreads to your backups.
> 
> Given this, aren't the various ZFS communities--particularly those that are 
> small machine oriented [3]--other than FreeNAS (and even they don't say it as 
> strongly enough in their docs), doing users a great disservice by implicitly 
> encouraging them to use ZFS w/o ECC RAM or on machines that can't use ECC RAM?
> 
> As an indication of how persuaded I've been for the need of ECC RAM, I've 
> shut down my personal server and am not going to access that data until I've 
> built a new machine with ECC RAM.
> 
> Phil
> 
> [1] ECC vs non-ECC RAM and ZFS: 
> http://forums.freenas.org/index.php?threads/ecc-vs-non-ecc-ram-and-zfs.15449/
> 
> [2] cyberjock: "So when you read about how using ZFS is an "all or none" I'm 
> not just making this up. I'm really serious as it really does work that way. 
> ZFS either works great or doesn't work at all. That really truthfully [is] 
> how it works."
> 
> [3] ZFS-macos, NAS4Free, PC-BSD, ZFS on Linux
> 
> 
> -- 
>  
> --- 
> You received this message because you are subscribed to the Google Groups 
> "zfs-macos" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to zfs-macos+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.



smime.p7s
Description: S/MIME cryptographic signature

Re: [zfs-macos] ZFS w/o ECC RAM -> Total loss of data

2014-02-26 Thread Jason Belec

Well that's one point of view and choice. 

I'm sure those you refer to are far more knowledgeable than any other 
individuals. 

I can only speak for myself. I have intentionally attempted to destroy data for 
years under ZFS, amazingly enough all data is always recoverable. I have 
intentionally stayed away from protected RAM to ensure data for clients is 
safe. 

So back to trolling. Let's be honest, if you were not trolling you would have 
started a new thread for people to discuss your views. 

Jason
Sent from my iPhone 5S

> On Feb 26, 2014, at 8:56 PM, Philip Robar  wrote:
> 
> Please note, I'm not trolling with this message. I worked in Sun's OS/Net 
> group and am a huge fan of ZFS.
> 
> The leading members of the FreeNAS community make it clear [1] (with a 
> detailed explanation and links to reports of data loss) that if you use ZFS 
> without ECC RAM that there is a very good chance that you will eventually 
> experience a total loss of your data without any hope of recovery. [2] 
> (Unless you have literally thousands of dollars to spend on that recovery. 
> And even then there's no guarantee of said recovery.) The features of ZFS, 
> checksumming and scrubbing, work together to silently spread the damage done 
> by cosmic rays and/or bad memory throughout a file system and this corruption 
> then spreads to your backups.
> 
> Given this, aren't the various ZFS communities--particularly those that are 
> small machine oriented [3]--other than FreeNAS (and even they don't say it as 
> strongly enough in their docs), doing users a great disservice by implicitly 
> encouraging them to use ZFS w/o ECC RAM or on machines that can't use ECC RAM?
> 
> As an indication of how persuaded I've been for the need of ECC RAM, I've 
> shut down my personal server and am not going to access that data until I've 
> built a new machine with ECC RAM.
> 
> Phil
> 
> [1] ECC vs non-ECC RAM and ZFS: 
> http://forums.freenas.org/index.php?threads/ecc-vs-non-ecc-ram-and-zfs.15449/
> 
> [2] cyberjock: "So when you read about how using ZFS is an "all or none" I'm 
> not just making this up. I'm really serious as it really does work that way. 
> ZFS either works great or doesn't work at all. That really truthfully [is] 
> how it works."
> 
> [3] ZFS-macos, NAS4Free, PC-BSD, ZFS on Linux
> 
> -- 
>  
> --- 
> You received this message because you are subscribed to the Google Groups 
> "zfs-macos" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to zfs-macos+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

62 matches

Mail list logo