Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
The whole raid does not fail -- we are talking about corruption here. If you lose some inodes your whole partition is not gone. My ZFS pool would not salvage -- poof, whole thing was gone (granted it was a test one and not a raidz or mirror yet). But still, for what happened, I cannot believe that 20G of data got messed up because a 1GB cache was not correctly flushed. Chad, I think what you're saying is for a zpool to allow you to salvage whatever remaining data that passes it's checksums. -- Regards, Jeremy ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
While other file systems, when they become corrupt, allow you to salvage data :-) They allow you to salvage what you *think* is your data. But in reality, you have no clue what the disks are giving you. Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
On Sat, 2 Dec 2006, Chad Leigh -- Shire.Net LLC wrote: On Dec 2, 2006, at 12:06 AM, Ian Collins wrote: Chad Leigh -- Shire.Net LLC wrote: On Dec 1, 2006, at 10:17 PM, Ian Collins wrote: Chad Leigh -- Shire.Net LLC wrote: There is not? People buy disk drives and expect them to corrupt their data? I expect the drives I buy to work fine (knowing that there could be bugs etc in them, the same as with my RAID systems). So you trust your important data to a single drive? I doubt it. But I bet you do trust your data to a hardware RAID array. Yes, but not because I expect a single drive to be more error prone (versus total failure). Total drive failure on a single disk loses all your data. But we are not talking total failure, we are talking errors that corrupt data. I buy individual drives with the expectation that they are designed to be error free and are error free for the most part and I do not expect a RAID array to be more robust in this regard (after all, the RAID is made up of a bunch of single drives). But people expect RAID to protect them from the corruption caused by a partial failure, say a bad block, which is a common failure mode. They do? I must admit no experience with the big standalone raid array storage units, just (expensive) HW raid cards, but I have never expected an array to protect me against data corruption. Bad blocks can be detected and remapped, and maybe the array can recalculate the block from parity etc, but that is a known disk error, and not the subtle kinds of errors created by the RAID array that are being claimed here. The worst system failure I experienced was caused by one half of a mirror experiencing bad blocks and the corrupt data being nicely mirrored on the other drive. ZFS would have saved this system from failure. None of my comments are meant to denigrate ZFS. I am implementing it myself. Some people on this list think that the RAID arrays are more likely to corrupt your data than JBOD (both with ZFS on top, for example, a ZFS mirror of 2 raid arrays or a JBOD mirror or raidz). There is no proof of this or even reasonable hypothetical explanation for this that I have seen presented. I don't think that the issue here, it's more one of perceived data integrity. People who have been happily using a single RAID 5 are now finding that the array has been silently corrupting their data. They are? They are being told that the problems they are having is due to that but there is no proof. It could be a bad driver for example. People expect errors form single drives, They do? The tech specs show very low failure rates for single drives in terms of bit errors. so they put them in a RAID knowing the firmware will protect them from drive errors. The RAID firmware will not protect them from bit errors on block reads unless the disk detects that the whole block is bad. I admit not knowing how much the disk itself can detect bit errors with CRC or similar sorts of things. This is incorrect. Lets take a simple example of a H/W RAID5 with 4 disk drives. If disk 1 returns a bad block when a stripe of data is read (and does not indicate an error condition), the RAID firmware will calculate the parity/CRC for the entire stripe (as it *always* does) and see that that there is an error present and transparently correct the error, before returning the corrected data upstream to the application (server). It can't correct every possible error - there will be limits depending on which CRC algorithms are implemented and the extend of the faulty data. But, in general, those algorithms, if correctly chosen and implemented, will correct most errors, most of the time. The main reason why not *all* the possible errors can be corrected, is because there are compromises to be made in: - the number of bits of CRC that will be calculated and stored - the CPU and memory resources required to perform the CRC calculations - limitations in the architecture of the RAID h/w, for example, how much bandwidth is available between the CPU, memory, disk I/O controllers and what level of bus contention can be tolerated - whether the RAID vendor wishes to make any money (hardware costs must be minimized) - whether the RAID vendor wishes to win benchmarking comparisons with their competition - how smart the firmware developers are and how much pressure is put on them to get the product to market - blah, blah, blah They often fail to recognise that the RAID firmware may not be perfect. ZFS, JBOS disk controllers, drivers for said disk controllers, etc may not be perfect either. ZFS looks to be the perfect tool for mirroring hardware RAID arrays, with the advantage over other schemes of knowing which side of the mirror has an error. Thus ZFS can be used as a tool to compliment, rather than replace hardware RAID. I agree. That is what I am doing :-)
Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
On Sat, 2 Dec 2006, Chad Leigh -- Shire.Net LLC wrote: On Dec 2, 2006, at 6:01 AM, [EMAIL PROTECTED] wrote: While other file systems, when they become corrupt, allow you to salvage data :-) They allow you to salvage what you *think* is your data. But in reality, you have no clue what the disks are giving you. I stand by what I said. If you have a massive disk failure, yes. You are right. When you have subtle corruption, some of the data and meta data is bad but not all. In that case you can recover (and verify the data if you have the means to do so) t he parts that did not get corrupted. My ZFS experience so far is that it basically said the whole 20GB pool was dead and I seriously doubt all 20GB was corrupted. That was because you built a pool with no redundancy. In the case where ZFS does not have a redundant config from which to try to reconstruct the data (today) it simply says: sorry charlie - you pool is corrupt. Regards, Al Hopper Logical Approach Inc, Plano, TX. [EMAIL PROTECTED] Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 OpenSolaris Governing Board (OGB) Member - Feb 2006 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
On 2-Dec-06, at 12:56 PM, Al Hopper wrote: On Sat, 2 Dec 2006, Chad Leigh -- Shire.Net LLC wrote: On Dec 2, 2006, at 6:01 AM, [EMAIL PROTECTED] wrote: While other file systems, when they become corrupt, allow you to salvage data :-) They allow you to salvage what you *think* is your data. But in reality, you have no clue what the disks are giving you. I stand by what I said. If you have a massive disk failure, yes. You are right. When you have subtle corruption, some of the data and meta data is bad but not all. In that case you can recover (and verify the data if you have the means to do so) t he parts that did not get corrupted. My ZFS experience so far is that it basically said the whole 20GB pool was dead and I seriously doubt all 20GB was corrupted. That was because you built a pool with no redundancy. In the case where ZFS does not have a redundant config from which to try to reconstruct the data (today) it simply says: sorry charlie - you pool is corrupt. Is that the whole story though? Even without redundancy, isn't there a lot of resilience against corruption (redundant metadata, etc)? --Toby Regards, Al Hopper Logical Approach Inc, Plano, TX. [EMAIL PROTECTED] Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 OpenSolaris Governing Board (OGB) Member - Feb 2006 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
On Dec 2, 2006, at 10:56 AM, Al Hopper wrote: On Sat, 2 Dec 2006, Chad Leigh -- Shire.Net LLC wrote: On Dec 2, 2006, at 6:01 AM, [EMAIL PROTECTED] wrote: While other file systems, when they become corrupt, allow you to salvage data :-) They allow you to salvage what you *think* is your data. But in reality, you have no clue what the disks are giving you. I stand by what I said. If you have a massive disk failure, yes. You are right. When you have subtle corruption, some of the data and meta data is bad but not all. In that case you can recover (and verify the data if you have the means to do so) t he parts that did not get corrupted. My ZFS experience so far is that it basically said the whole 20GB pool was dead and I seriously doubt all 20GB was corrupted. That was because you built a pool with no redundancy. In the case where ZFS does not have a redundant config from which to try to reconstruct the data (today) it simply says: sorry charlie - you pool is corrupt. Where a RAID system would still be salvageable. Chad --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
Chad Leigh -- Shire.Net LLC wrote: On Dec 2, 2006, at 10:56 AM, Al Hopper wrote: On Sat, 2 Dec 2006, Chad Leigh -- Shire.Net LLC wrote: On Dec 2, 2006, at 6:01 AM, [EMAIL PROTECTED] wrote: While other file systems, when they become corrupt, allow you to salvage data :-) They allow you to salvage what you *think* is your data. But in reality, you have no clue what the disks are giving you. I stand by what I said. If you have a massive disk failure, yes. You are right. When you have subtle corruption, some of the data and meta data is bad but not all. In that case you can recover (and verify the data if you have the means to do so) t he parts that did not get corrupted. My ZFS experience so far is that it basically said the whole 20GB pool was dead and I seriously doubt all 20GB was corrupted. That was because you built a pool with no redundancy. In the case where ZFS does not have a redundant config from which to try to reconstruct the data (today) it simply says: sorry charlie - you pool is corrupt. Where a RAID system would still be salvageable. That is a comparison of apples to oranges. The RAID system has Redundancy. If the ZFS pool had been configured with redundancy, it would have fared at least as well as the RAID system. Without redundancy, neither of them can magically reconstruct data. The RAID system would simply be an AID system. -- Jeff VICTOR Sun Microsystemsjeff.victor @ sun.com OS AmbassadorSr. Technical Specialist Solaris 10 Zones FAQ:http://www.opensolaris.org/os/community/zones/faq -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
On Dec 2, 2006, at 12:29 PM, Jeff Victor wrote: Chad Leigh -- Shire.Net LLC wrote: On Dec 2, 2006, at 10:56 AM, Al Hopper wrote: On Sat, 2 Dec 2006, Chad Leigh -- Shire.Net LLC wrote: On Dec 2, 2006, at 6:01 AM, [EMAIL PROTECTED] wrote: While other file systems, when they become corrupt, allow you to salvage data :-) They allow you to salvage what you *think* is your data. But in reality, you have no clue what the disks are giving you. I stand by what I said. If you have a massive disk failure, yes. You are right. When you have subtle corruption, some of the data and meta data is bad but not all. In that case you can recover (and verify the data if you have the means to do so) t he parts that did not get corrupted. My ZFS experience so far is that it basically said the whole 20GB pool was dead and I seriously doubt all 20GB was corrupted. That was because you built a pool with no redundancy. In the case where ZFS does not have a redundant config from which to try to reconstruct the data (today) it simply says: sorry charlie - you pool is corrupt. Where a RAID system would still be salvageable. That is a comparison of apples to oranges. The RAID system has Redundancy. If the ZFS pool had been configured with redundancy, it would have fared at least as well as the RAID system. Without redundancy, neither of them can magically reconstruct data. The RAID system would simply be an AID system. That is not the question. Assuming the error came OUT of the RAID system (which it did in this case as there was a bug in the driver and the cache did not get flushed in a certain shutdown situation), another FS would have been salvageable as the whole 20GB of the pool was not corrupt. Chad --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
On 02/12/06, Chad Leigh -- Shire.Net LLC [EMAIL PROTECTED] wrote: On Dec 2, 2006, at 10:56 AM, Al Hopper wrote: On Sat, 2 Dec 2006, Chad Leigh -- Shire.Net LLC wrote: On Dec 2, 2006, at 6:01 AM, [EMAIL PROTECTED] wrote: When you have subtle corruption, some of the data and meta data is bad but not all. In that case you can recover (and verify the data if you have the means to do so) t he parts that did not get corrupted. My ZFS experience so far is that it basically said the whole 20GB pool was dead and I seriously doubt all 20GB was corrupted. That was because you built a pool with no redundancy. In the case where ZFS does not have a redundant config from which to try to reconstruct the data (today) it simply says: sorry charlie - you pool is corrupt. Where a RAID system would still be salvageable. RAID level what? How is anything salvagable if you lose your only copy? ZFS does store multiple copies of metadata in a single vdev, so I assume we're talking about data here. -- Rasputin :: Jack of All Trades - Master of Nuns http://number9.hellooperator.net/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
On Sat, 2 Dec 2006, Al Hopper wrote: Some people on this list think that the RAID arrays are more likely to corrupt your data than JBOD (both with ZFS on top, for example, a ZFS mirror of 2 raid arrays or a JBOD mirror or raidz). There is no Can you present a cut/paste where that assertion was made? I don't want to put words in Chad's mouth, but I think he might be misunderstanding representations that people make here about ZFS vs HW RAID. I don't think that people have asserted that RAID arrays are more likely to corrupt data than a JBOD; what I think people ARE asserting is that corruption is more likely to go undetected in a HW RAID than in a JBOD with ZFS. (A subtle, but important, difference.) The reason for this is understandable: if you write some data to a HW RAID device, you assume that unless otherwise notified, your data is safe. The HW RAID, but its very nature, is a black box that we assume is OK. With ZFS+JBOD, ZFS' built in end-to-end error checking will catch any silent errors created in the JBOD, when they happen, and can correct them (or at least notify you) right away. -- Rich Teer, SCSA, SCNA, SCSECA, OpenSolaris CAB member President, Rite Online Inc. Voice: +1 (250) 979-1638 URL: http://www.rite-group.com/rich ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
On Tue, 28 Nov 2006, Elizabeth Schwartz wrote: Well, I fixed the HW but I had one bad file, and the problem was that ZFS Hi Elizabeth, Followup: When you say you fixed the HW, I'm curious as to what you found and if this experience with ZFS convinced you that your trusted RAID H/W did, in fact, have issues? Do you think that it's likely that there are others running production systems on RAID systems that they trust, but don't realize may have bugs (causing data corruption) that have yet to be discovered? was saying delete the pool and restore from tape when, it turns out, the answer is just find the file with the bad inode, delete it, clear the device and scrub. Maybe more of a documentation problme, but it sure is disconcerting to have a file system threatening to give up the game over one bad file (and the real irony: it was a file in someone's TRASH!) Anyway I'm back in business without a restore (and with a rebuilt RAID) but yeesh, it sure took a lot of escalating to get to the point where someone knew to tell me to do a find -inum. Al Hopper Logical Approach Inc, Plano, TX. [EMAIL PROTECTED] Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 OpenSolaris Governing Board (OGB) Member - Feb 2006 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
On Dec 1, 2006, at 9:50 AM, Al Hopper wrote: Followup: When you say you fixed the HW, I'm curious as to what you found and if this experience with ZFS convinced you that your trusted RAID H/W did, in fact, have issues? Do you think that it's likely that there are others running production systems on RAID systems that they trust, but don't realize may have bugs (causing data corruption) that have yet to be discovered? And this is different from any other storage system, how? (ie, JBOD controllers and disks can also have subtle bugs that corrupt data) Chad --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
Chad Leigh -- Shire.Net LLC wrote: On Dec 1, 2006, at 9:50 AM, Al Hopper wrote: Followup: When you say you fixed the HW, I'm curious as to what you found and if this experience with ZFS convinced you that your trusted RAID H/W did, in fact, have issues? Do you think that it's likely that there are others running production systems on RAID systems that they trust, but don't realize may have bugs (causing data corruption) that have yet to be discovered? And this is different from any other storage system, how? (ie, JBOD controllers and disks can also have subtle bugs that corrupt data) Of course, but there isn't the expectation of data reliability with a JBOD that there is with some RAID configurations. Dana ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
On Dec 1, 2006, at 4:34 PM, Dana H. Myers wrote: Chad Leigh -- Shire.Net LLC wrote: On Dec 1, 2006, at 9:50 AM, Al Hopper wrote: Followup: When you say you fixed the HW, I'm curious as to what you found and if this experience with ZFS convinced you that your trusted RAID H/W did, in fact, have issues? Do you think that it's likely that there are others running production systems on RAID systems that they trust, but don't realize may have bugs (causing data corruption) that have yet to be discovered? And this is different from any other storage system, how? (ie, JBOD controllers and disks can also have subtle bugs that corrupt data) Of course, but there isn't the expectation of data reliability with a JBOD that there is with some RAID configurations. There is not? People buy disk drives and expect them to corrupt their data? I expect the drives I buy to work fine (knowing that there could be bugs etc in them, the same as with my RAID systems). Chad --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
Chad Leigh -- Shire.Net LLC wrote: On Dec 1, 2006, at 4:34 PM, Dana H. Myers wrote: Chad Leigh -- Shire.Net LLC wrote: On Dec 1, 2006, at 9:50 AM, Al Hopper wrote: Followup: When you say you fixed the HW, I'm curious as to what you found and if this experience with ZFS convinced you that your trusted RAID H/W did, in fact, have issues? Do you think that it's likely that there are others running production systems on RAID systems that they trust, but don't realize may have bugs (causing data corruption) that have yet to be discovered? And this is different from any other storage system, how? (ie, JBOD controllers and disks can also have subtle bugs that corrupt data) Of course, but there isn't the expectation of data reliability with a JBOD that there is with some RAID configurations. There is not? People buy disk drives and expect them to corrupt their data? I expect the drives I buy to work fine (knowing that there could be bugs etc in them, the same as with my RAID systems). So, what do you think reliable RAID configurations are for? Dana ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
Chad Leigh -- Shire.Net LLC wrote: On Dec 1, 2006, at 4:34 PM, Dana H. Myers wrote: Chad Leigh -- Shire.Net LLC wrote: And this is different from any other storage system, how? (ie, JBOD controllers and disks can also have subtle bugs that corrupt data) Of course, but there isn't the expectation of data reliability with a JBOD that there is with some RAID configurations. There is not? People buy disk drives and expect them to corrupt their data? I expect the drives I buy to work fine (knowing that there could be bugs etc in them, the same as with my RAID systems). So you trust your important data to a single drive? I doubt it. But I bet you do trust your data to a hardware RAID array. Ian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
On 1-Dec-06, at 6:29 PM, Chad Leigh -- Shire.Net LLC wrote: On Dec 1, 2006, at 9:50 AM, Al Hopper wrote: Followup: When you say you fixed the HW, I'm curious as to what you found and if this experience with ZFS convinced you that your trusted RAID H/W did, in fact, have issues? Do you think that it's likely that there are others running production systems on RAID systems that they trust, but don't realize may have bugs (causing data corruption) that have yet to be discovered? And this is different from any other storage system, how? (ie, JBOD controllers and disks can also have subtle bugs that corrupt data) I think Al probably means, running production systems on [any trusted storage system where errors can remain undetected] (contrasted with ZFS). --Toby Chad --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
On 1-Dec-06, at 6:36 PM, Chad Leigh -- Shire.Net LLC wrote: On Dec 1, 2006, at 4:34 PM, Dana H. Myers wrote: Chad Leigh -- Shire.Net LLC wrote: On Dec 1, 2006, at 9:50 AM, Al Hopper wrote: Followup: When you say you fixed the HW, I'm curious as to what you found and if this experience with ZFS convinced you that your trusted RAID H/W did, in fact, have issues? Do you think that it's likely that there are others running production systems on RAID systems that they trust, but don't realize may have bugs (causing data corruption) that have yet to be discovered? And this is different from any other storage system, how? (ie, JBOD controllers and disks can also have subtle bugs that corrupt data) Of course, but there isn't the expectation of data reliability with a JBOD that there is with some RAID configurations. There is not? People buy disk drives and expect them to corrupt their data? I expect the drives I buy to work fine (knowing that there could be bugs etc in them, the same as with my RAID systems). Yes, but in either case, ZFS will tell you. Other filesystems in general cannot. --Toby Chad --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
On Dec 1, 2006, at 10:17 PM, Ian Collins wrote: Chad Leigh -- Shire.Net LLC wrote: On Dec 1, 2006, at 4:34 PM, Dana H. Myers wrote: Chad Leigh -- Shire.Net LLC wrote: And this is different from any other storage system, how? (ie, JBOD controllers and disks can also have subtle bugs that corrupt data) Of course, but there isn't the expectation of data reliability with a JBOD that there is with some RAID configurations. There is not? People buy disk drives and expect them to corrupt their data? I expect the drives I buy to work fine (knowing that there could be bugs etc in them, the same as with my RAID systems). So you trust your important data to a single drive? I doubt it. But I bet you do trust your data to a hardware RAID array. Yes, but not because I expect a single drive to be more error prone (versus total failure). Total drive failure on a single disk loses all your data. But we are not talking total failure, we are talking errors that corrupt data. I buy individual drives with the expectation that they are designed to be error free and are error free for the most part and I do not expect a RAID array to be more robust in this regard (after all, the RAID is made up of a bunch of single drives). Some people on this list think that the RAID arrays are more likely to corrupt your data than JBOD (both with ZFS on top, for example, a ZFS mirror of 2 raid arrays or a JBOD mirror or raidz). There is no proof of this or even reasonable hypothetical explanation for this that I have seen presented. Chad Ian --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
On Dec 1, 2006, at 10:42 PM, Toby Thain wrote: On 1-Dec-06, at 6:36 PM, Chad Leigh -- Shire.Net LLC wrote: On Dec 1, 2006, at 4:34 PM, Dana H. Myers wrote: Chad Leigh -- Shire.Net LLC wrote: On Dec 1, 2006, at 9:50 AM, Al Hopper wrote: Followup: When you say you fixed the HW, I'm curious as to what you found and if this experience with ZFS convinced you that your trusted RAID H/W did, in fact, have issues? Do you think that it's likely that there are others running production systems on RAID systems that they trust, but don't realize may have bugs (causing data corruption) that have yet to be discovered? And this is different from any other storage system, how? (ie, JBOD controllers and disks can also have subtle bugs that corrupt data) Of course, but there isn't the expectation of data reliability with a JBOD that there is with some RAID configurations. There is not? People buy disk drives and expect them to corrupt their data? I expect the drives I buy to work fine (knowing that there could be bugs etc in them, the same as with my RAID systems). Yes, but in either case, ZFS will tell you. And then kill your whole pool :-) Other filesystems in general cannot. While other file systems, when they become corrupt, allow you to salvage data :-) Chad --Toby Chad --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
Chad Leigh -- Shire.Net LLC wrote: On Dec 2, 2006, at 12:06 AM, Ian Collins wrote: But people expect RAID to protect them from the corruption caused by a partial failure, say a bad block, which is a common failure mode. They do? I must admit no experience with the big standalone raid array storage units, just (expensive) HW raid cards, but I have never expected an array to protect me against data corruption. Bad blocks can be detected and remapped, and maybe the array can recalculate the block from parity etc, but that is a known disk error, and not the subtle kinds of errors created by the RAID array that are being claimed here. I must admit that 'they' in my experience have been windows admins! I don't think that the issue here, it's more one of perceived data integrity. People who have been happily using a single RAID 5 are now finding that the array has been silently corrupting their data. They are? They are being told that the problems they are having is due to that but there is no proof. It could be a bad driver for example. Either way, they are still finding errors they didn't know existed. ZFS looks to be the perfect tool for mirroring hardware RAID arrays, with the advantage over other schemes of knowing which side of the mirror has an error. Thus ZFS can be used as a tool to compliment, rather than replace hardware RAID. I agree. That is what I am doing :-) I'll be interested to see how you get on. Cheers, Ian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
Chad Leigh -- Shire.Net LLC wrote: On Dec 2, 2006, at 12:06 AM, Ian Collins wrote: [...] I don't think that the issue here, it's more one of perceived data integrity. People who have been happily using a single RAID 5 are now finding that the array has been silently corrupting their data. They are? They are being told that the problems they are having is due to that but there is no proof. It could be a bad driver for example. Or a bad cable, or a bad controller IC, or a bad cache RAM. Or something. The point is, the entire path from the disk to the main system memory is the error domain. ZFS sits at the top of this domain and thus can detect and correct errors that something lower in the domain can not. People expect errors form single drives, They do? The tech specs show very low failure rates for single drives in terms of bit errors. Very low. Almost never, perhaps, but not never. Bit errors happen. When they do, data is corrupted. Hence, single drives corrupt data - just not very often and not repeatably at will. So those soft errors are easy to ignore or dismiss as something else. so they put them in a RAID knowing the firmware will protect them from drive errors. The RAID firmware will not protect them from bit errors on block reads unless the disk detects that the whole block is bad. I admit not knowing how much the disk itself can detect bit errors with CRC or similar sorts of things. Actually, some RAID configurations should be able to detect errors as they calculate and check the parity block. They often fail to recognise that the RAID firmware may not be perfect. ZFS, JBOS disk controllers, drivers for said disk controllers, etc may not be perfect either. Sure. Nothing's perfect. What's your point? ZFS sits on top of the pile of imperfection, and is thus able to make the entire error domain no worse than ZFS, where it is likely much worse to begin with. Dana ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
Hi Betsy, Yes, part of this is a documentation problem. I recently documented the find -inum scenario in the community version of the admin guide. Please see page 156, (well, for next time) here: http://opensolaris.org/os/community/zfs/docs/ We're working on the larger issue as well. Cindy Elizabeth Schwartz wrote: Well, I fixed the HW but I had one bad file, and the problem was that ZFS was saying delete the pool and restore from tape when, it turns out, the answer is just find the file with the bad inode, delete it, clear the device and scrub. Maybe more of a documentation problme, but it sure is disconcerting to have a file system threatening to give up the game over one bad file (and the real irony: it was a file in someone's TRASH!) Anyway I'm back in business without a restore (and with a rebuilt RAID) but yeesh, it sure took a lot of escalating to get to the point where someone knew to tell me to do a find -inum. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
Well, I fixed the HW but I had one bad file, and the problem was that ZFS was saying delete the pool and restore from tape when, it turns out, the answer is just find the file with the bad inode, delete it, clear the device and scrub. Maybe more of a documentation problme, but it sure is disconcerting to have a file system threatening to give up the game over one bad file (and the real irony: it was a file in someone's TRASH!) Anyway I'm back in business without a restore (and with a rebuilt RAID) but yeesh, it sure took a lot of escalating to get to the point where someone knew to tell me to do a find -inum. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
On 28-Nov-06, at 10:01 PM, Elizabeth Schwartz wrote: Well, I fixed the HW but I had one bad file, and the problem was that ZFS was saying delete the pool and restore from tape when, it turns out, the answer is just find the file with the bad inode, delete it, clear the device and scrub. Maybe more of a documentation problme, but it sure is disconcerting to have a file system threatening to give up the game over one bad file (and the real irony: it was a file in someone's TRASH!) Anyway I'm back in business without a restore (and with a rebuilt RAID) but yeesh, it sure took a lot of escalating to get to the point where someone knew to tell me to do a find -inum. Do you now have a redundant ZFS configuration, to prevent future data loss/inconvenience? --T ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)
On 11/28/06, Elizabeth Schwartz [EMAIL PROTECTED] wrote: Well, I fixed the HW but I had one bad file, and the problem was that ZFS was saying delete the pool and restore from tape when, it turns out, the answer is just find the file with the bad inode, delete it, clear the device and scrub. Maybe more of a documentation problme, but it sure is disconcerting to have a file system threatening to give up the game over one bad file (and the real irony: it was a file in someone's TRASH!) The ZFS documentation was assuming you wanted to recover the data, not abandon it. Which, realistically, isn't always what people want; when you know a small number of files are trashed, it's often easier to delete those files and either just go on, or restore only those files, compared to a full restore. So yeah, perhaps the documentation could be more helpful in that situation. Anyway I'm back in business without a restore (and with a rebuilt RAID) but yeesh, it sure took a lot of escalating to get to the point where someone knew to tell me to do a find -inum. Ah, if people here had realized that's what you needed to know, many of us could have told you I'm sure. I, at least, hadn't realized that was one of the problem points. (Probably too focused on the ZFS content to think about the general issues enough!) Very glad you're back in service, anyway! -- David Dyer-Bennet, mailto:[EMAIL PROTECTED], http://www.dd-b.net/dd-b/ RKBA: http://www.dd-b.net/carry/ Pics: http://www.dd-b.net/dd-b/SnapshotAlbum/ Dragaera/Steven Brust: http://dragaera.info/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss