Re: [zfs-discuss] Re: Re: Re: Re: Proposal: multiple copies of user data
On 9/15/06, can you guess? [EMAIL PROTECTED] wrote: Implementing it at the directory and file levels would be even more flexible: redundancy strategy would no longer be tightly tied to path location, but directories and files could themselves still inherit defaults from the filesystem and pool when appropriate (but could be individually handled when desirable). Ideally so. FS (or dataset) level is sufficiently fine grain for my use. If I take the trouble to specify copies for a directory, I really do not mind the trouble of creating a new dataset for it at the same time. file-level, however, is really pushing it. You might end up with an administrative nightmare deciphering which files have how many copies. I just do not see it being useful to my environment. It would be interesting to know whether that would still be your experience in environments that regularly scrub active data as ZFS does (assuming that said experience was accumulated in environments that don't). The theory behind scrubbing is that all data areas will be hit often enough that they won't have time to deteriorate (gradually) to the point where they can't be read at all, and early deterioration encountered during the scrub pass (or other access) in which they have only begun to become difficult to read will result in immediate revectoring (by the disk or, if not, by the file system) to healthier locations. Scrubbing exercises the disk area to prevent bit-rot. I do not think ZFS's scrubbing changes the failure mode of the raw devices. OTOH, I really have no such experience to speak of *fingers crossed*. I failed to locate the code where the relocation of files happens but assume that copies would make this process more reliable. Since ZFS-style scrubbing detects even otherwise-indetectible 'silent corruption' missed by the disk's own ECC mechanisms, that lower-probability event is also covered (though my impression is that the probability of even a single such sector may be significantly lower than that of whole-disk failure, especially in laptop environments). I do not any data to support nor dismiss that. Matt was right that probability of failure modes is a huge can of worms that can drag forever. -- Just me, Wire ... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Re: Re: Proposal: multiple copies of user data
On 9/13/06, Matthew Ahrens [EMAIL PROTECTED] wrote: Sure, if you want *everything* in your pool to be mirrored, there is no real need for this feature (you could argue that setting up the pool would be easier if you didn't have to slice up the disk though). Not necessarily. Implementing this on the FS level will still allow the administrator to turn on copies on the entire pool if since the pool is technically also a FS and the property is inherited by child FS's. Of course, this will allow the admin to turn off copies to the FS containing junk. Implementing it at the directory and file levels would be even more flexible: redundancy strategy would no longer be tightly tied to path location, but directories and files could themselves still inherit defaults from the filesystem and pool when appropriate (but could be individually handled when desirable). I've never understood why redundancy was a pool characteristic in ZFS - and the addition of 'ditto blocks' and now this new proposal (both of which introduce completely new forms of redundancy to compensate for the fact that pool-level redundancy doesn't satisfy some needs) just makes me more skeptical about it. (Not that I intend in any way to minimize the effort it might take to change that decision now.) It could be recommended in some situations. If you want to protect against disk firmware errors, bit flips, part of the disk getting scrogged, then mirroring on a single disk (whether via a mirror vdev or copies=2) solves your problem. Admittedly, these problems are probably less common that whole-disk failure, which mirroring on a single disk does not address. I beg to differ from experience that the above errors are more common than whole disk failures. It's just that we do not notice the disks are developing problems but panic when they finally fail completely. It would be interesting to know whether that would still be your experience in environments that regularly scrub active data as ZFS does (assuming that said experience was accumulated in environments that don't). The theory behind scrubbing is that all data areas will be hit often enough that they won't have time to deteriorate (gradually) to the point where they can't be read at all, and early deterioration encountered during the scrub pass (or other access) in which they have only begun to become difficult to read will result in immediate revectoring (by the disk or, if not, by the file system) to healthier locations. Since ZFS-style scrubbing detects even otherwise-indetectible 'silent corruption' missed by the disk's own ECC mechanisms, that lower-probability event is also covered (though my impression is that the probability of even a single such sector may be significantly lower than that of whole-disk failure, especially in laptop environments). All that being said, keeping multiple copies on a single disk of most metadata (the loss of which could lead to wide-spread data loss) definitely makes sense (especially given its typically negligible size), and it probably makes sense for some files as well. - bill This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: Re: Re: Proposal: multiple copies of user data
On Fri, Sep 15, 2006 at 01:23:31AM -0700, can you guess? wrote: Implementing it at the directory and file levels would be even more flexible: redundancy strategy would no longer be tightly tied to path location, but directories and files could themselves still inherit defaults from the filesystem and pool when appropriate (but could be individually handled when desirable). The problem boils down to not having a way to express your intent that works over NFS (where you're basically limited by POSIX) that you can use from any platform (esp. ones where ZFS isn't installed). If you have some ideas, this is something we'd love to hear about. I've never understood why redundancy was a pool characteristic in ZFS - and the addition of 'ditto blocks' and now this new proposal (both of which introduce completely new forms of redundancy to compensate for the fact that pool-level redundancy doesn't satisfy some needs) just makes me more skeptical about it. We have thought long and hard about this problem and even know how to implement it (the name we've been using is Metaslab Grids, which isn't terribly descriptive, or as Matt put it a bag o' disks). There are two main problems with it, though. One is failures. The problem is that you want the set of disks implementing redundancy (mirror, RAID-Z, etc.) to be spread across fault domains (controller, cable, fans, power supplies, geographic sites) as much as possible. There is no generic mechanism to obtain this information and act upon it. We could ask the administrator to supply it somehow, but such a description takes effort, is not easy, and prone to error. That's why we have the model right now where the administrator specifies how they want the disks spread out across fault groups (vdevs). The second problem comes back to accounting. If you can specify, on a per-file or per-directory basis, what kind of replication you want, how do you answer the statvfs() question? I think the recent discussions on this list illustrate the complexity and passion on both sides of the argument. (Not that I intend in any way to minimize the effort it might take to change that decision now.) The effort is not actually that great. All the hard problems we needed to solve in order to implement this were basically solved when we did the RAID-Z code. As a matter of fact, you can see it in the on-disk specification as well. In the DVA, you'll notice an 8-bit field labeled GRID. These are the bits that would describe, on a per-block basis, what kind of redundancy we used. --Bill ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Re: Re: Proposal: multiple copies of user data
On 12/09/06, Celso [EMAIL PROTECTED] wrote: I think it has already been said that in many peoples experience, when a disk fails, it completely fails. Especially on laptops. Of course ditto blocks wouldn't help you in this situation either! Exactly. I still think that silent data corruption is a valid concern, one that ditto blocks would solve. Also, I am not thrilled about losing that much space for duplication of unneccessary data (caused by partitioning a disk in two). Well, you'd only be duplicating the data on the mirror. If you don't want to mirror the base OS, no one's saying you have to. Yikes! that sounds like even more partitioning! For the sake of argument, let's assume: 1. disk is expensive 2. someone is keeping valuable files on a non-redundant zpool 3. they can't scrape enough vdevs to make a redundant zpool (remembering you can build vdevs out of *flat files*) Even then, to my mind: to the user, the *file* (screenplay, movie of childs birth, civ3 saved game, etc.) is the logical entity to have a 'duplication level' attached to it, and the only person who can score that is the author of the file. This proposal says the filesystem creator/admin scores the filesystem. Your argument against unneccessary data duplication applies to all 'non-special' files in the 'special' filesystem. They're wasting space too. If the user wants to make sure the file is 'safer' than others, he can just make multiple copies. Either to a USB disk/flashdrive, cdrw, dvd, ftp server, whatever. The redundancy you're talking about is what you'd get from 'cp /foo/bar.jpg /foo/bar.jpg.ok', except it's hidden from the user and causing headaches for anyone trying to comprehend, port or extend the codebase in the future. the proposed solution differs in one important aspect: it automatically detects data corruption. I also echo Darren's comments on zfs performing better when it has the whole disk. Me too, but a lot of laptop users dual-boot, which makes it a moot point. Hopefully we can agree that you lose nothing by adding this feature, even if you personally don't see a need for it. Sorry, I don't think we're going to agree on this one :) No worries, that's cool. All the best Dick. -- Rasputin :: Jack of All Trades - Master of Nuns http://number9.hellooperator.net/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discu ss Celso This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: Re: Re: Proposal: multiple copies of user data
On Sep 12, 2006, at 4:39 PM, Celso wrote: On 12/09/06, Celso [EMAIL PROTECTED] wrote: I think it has already been said that in many peoples experience, when a disk fails, it completely fails. Especially on laptops. Of course ditto blocks wouldn't help you in this situation either! Exactly. I still think that silent data corruption is a valid concern, one that ditto blocks would solve. Also, I am not thrilled about losing that much space for duplication of unneccessary data (caused by partitioning a disk in two). Well, you'd only be duplicating the data on the mirror. If you don't want to mirror the base OS, no one's saying you have to. Yikes! that sounds like even more partitioning! The redundancy you're talking about is what you'd get from 'cp /foo/bar.jpg /foo/bar.jpg.ok', except it's hidden from the user and causing headaches for anyone trying to comprehend, port or extend the codebase in the future. the proposed solution differs in one important aspect: it automatically detects data corruption. Detecting data corruption is a function of the ZFS checksumming feature. The proposed solution has _nothing_ to do with detecting corruption. The difference is in what happens when/if such bad data is detected. Without a duplicate copy, via some RAID level or the proposed ditto block copies, the file is corrupted. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: Re: Re: Proposal: multiple copies of user data
Chad Lewis wrote: On Sep 12, 2006, at 4:39 PM, Celso wrote: the proposed solution differs in one important aspect: it automatically detects data corruption. Detecting data corruption is a function of the ZFS checksumming feature. The proposed solution has _nothing_ to do with detecting corruption. The difference is in what happens when/if such bad data is detected. Without a duplicate copy, via some RAID level or the proposed ditto block copies, the file is corrupted. With a mirrored ZFS pool, what are the odds of losing all copies of the [meta]data, for N disks (where N = 1, 2, etc)? I thought we understood this pretty well, and that the answer was extremely small. -- Jeff VICTOR Sun Microsystemsjeff.victor @ sun.com OS AmbassadorSr. Technical Specialist Solaris 10 Zones FAQ:http://www.opensolaris.org/os/community/zones/faq -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: Re: Re: Proposal: multiple copies of user data
On 9/12/06, Celso [EMAIL PROTECTED] wrote: Whether it's hard to understand is debatable, but this feature integrates very smoothly with the existing infrastructure and wouldn't cause any trouble when extending or porting ZFS. OK, given this statement... Just for the record, these changes are pretty trivial to implement; less than 50 lines of code changed. and this statement, I can't see any reasons not to include it. If the changes are easy to do, don't require anymore of the zfs team's valuable time, and don't hinder other things, I would plead with you to include them, as I think they are genuinely valuable and would make zfs not only the best enterprise level filesystem, but also the best filesystem for laptops/home computers. While I'm not a big fan of this feature, if the work is that well understood and that small, I have no objection to it. (Boy that sounds snotty; apologies, not what I intend here. Those of you reading this know how muich you care about my opinion, that's up to you.) I do pity the people who count on the ZFS redundancy to protect their presentation on an important sales trip -- and then have their laptop stolen. But those people might well be the same ones who would have *no* redundancy otherwise. And nothing about this feature prevents the paranoids like me from still making our backup CD and carrying it separately. I'm not prepared to go so far as to argue that it's bad to make them feel safer :-). At least, to make them feel safer *by making them actually safer*. -- David Dyer-Bennet, mailto:[EMAIL PROTECTED], http://www.dd-b.net/dd-b/ RKBA: http://www.dd-b.net/carry/ Pics: http://www.dd-b.net/dd-b/SnapshotAlbum/ Dragaera/Steven Brust: http://dragaera.info/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: Re: Re: Proposal: multiple copies of user data
David Dyer-Bennet wrote: While I'm not a big fan of this feature, if the work is that well understood and that small, I have no objection to it. (Boy that sounds snotty; apologies, not what I intend here. Those of you reading this know how muich you care about my opinion, that's up to you.) One could make the argument that the feature could cause enough confusion to not warrant its inclusion. If I'm a typical user and I write a file to the filesystem where the admin set three copies but didn't tell me it might throw me into a tizzy trying to figure out why my quota is 3X where I expect it to be. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss