Re: [zfs-discuss] Re: Re: Re: Re: Proposal: multiple copies of user data

2006-09-16 Thread Wee Yeh Tan

On 9/15/06, can you guess? [EMAIL PROTECTED] wrote:

Implementing it at the directory and file levels would be even more flexible:  
redundancy strategy would no longer be tightly tied to path location, but 
directories and files could themselves still inherit defaults from the 
filesystem and pool when appropriate (but could be individually handled when 
desirable).


Ideally so.  FS (or dataset) level is sufficiently fine grain for my
use.  If I take the trouble to specify copies for a directory, I
really do not mind the trouble of creating a new dataset for it at the
same time.  file-level, however, is really pushing it.  You might end
up with an administrative nightmare deciphering which files have how
many copies.  I just do not see it being useful to my environment.


It would be interesting to know whether that would still be your experience in 
environments that regularly scrub active data as ZFS does (assuming that said 
experience was accumulated in environments that don't).  The theory behind 
scrubbing is that all data areas will be hit often enough that they won't have 
time to deteriorate (gradually) to the point where they can't be read at all, 
and early deterioration encountered during the scrub pass (or other access) in 
which they have only begun to become difficult to read will result in immediate 
revectoring (by the disk or, if not, by the file system) to healthier locations.


Scrubbing exercises the disk area to prevent bit-rot.  I do not think
ZFS's scrubbing changes the failure mode of the raw devices.  OTOH, I
really have no such experience to speak of *fingers crossed*.  I
failed to locate the code where the relocation of files happens but
assume that copies would make this process more reliable.


Since ZFS-style scrubbing detects even otherwise-indetectible 'silent 
corruption' missed by the disk's own ECC mechanisms, that lower-probability 
event is also covered (though my impression is that the probability of even a 
single such sector may be significantly lower than that of whole-disk failure, 
especially in laptop environments).


I do not any data to support nor dismiss that. Matt was right that
probability of failure modes is a huge can of worms that can drag
forever.


--
Just me,
Wire ...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Re: Re: Proposal: multiple copies of user data

2006-09-15 Thread can you guess?
 On 9/13/06, Matthew Ahrens [EMAIL PROTECTED]
 wrote:
  Sure, if you want *everything* in your pool to be
 mirrored, there is no
  real need for this feature (you could argue that
 setting up the pool
  would be easier if you didn't have to slice up the
 disk though).
 
 Not necessarily.  Implementing this on the FS level
 will still allow
 the administrator to turn on copies on the entire
 pool if since the
 pool is technically also a FS and the property is
 inherited by child
 FS's.  Of course, this will allow the admin to turn
 off copies to the
 FS containing junk.

Implementing it at the directory and file levels would be even more flexible:  
redundancy strategy would no longer be tightly tied to path location, but 
directories and files could themselves still inherit defaults from the 
filesystem and pool when appropriate (but could be individually handled when 
desirable).

I've never understood why redundancy was a pool characteristic in ZFS - and the 
addition of 'ditto blocks' and now this new proposal (both of which introduce 
completely new forms of redundancy to compensate for the fact that pool-level 
redundancy doesn't satisfy some needs) just makes me more skeptical about it.

(Not that I intend in any way to minimize the effort it might take to change 
that decision now.)

 
  It could be recommended in some situations.  If you
 want to protect
  against disk firmware errors, bit flips, part of
 the disk getting
  scrogged, then mirroring on a single disk (whether
 via a mirror vdev or
  copies=2) solves your problem.  Admittedly, these
 problems are probably
  less common that whole-disk failure, which
 mirroring on a single disk
  does not address.
 
 I beg to differ from experience that the above errors
 are more common
 than whole disk failures.  It's just that we do not
 notice the disks
 are developing problems but panic when they finally
 fail completely.

It would be interesting to know whether that would still be your experience in 
environments that regularly scrub active data as ZFS does (assuming that said 
experience was accumulated in environments that don't).  The theory behind 
scrubbing is that all data areas will be hit often enough that they won't have 
time to deteriorate (gradually) to the point where they can't be read at all, 
and early deterioration encountered during the scrub pass (or other access) in 
which they have only begun to become difficult to read will result in immediate 
revectoring (by the disk or, if not, by the file system) to healthier locations.

Since ZFS-style scrubbing detects even otherwise-indetectible 'silent 
corruption' missed by the disk's own ECC mechanisms, that lower-probability 
event is also covered (though my impression is that the probability of even a 
single such sector may be significantly lower than that of whole-disk failure, 
especially in laptop environments).

All that being said, keeping multiple copies on a single disk of most metadata 
(the loss of which could lead to wide-spread data loss) definitely makes sense 
(especially given its typically negligible size), and it probably makes sense 
for some files as well.

- bill
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: Re: Re: Proposal: multiple copies of user data

2006-09-15 Thread Bill Moore
On Fri, Sep 15, 2006 at 01:23:31AM -0700, can you guess? wrote:
 Implementing it at the directory and file levels would be even more
 flexible:  redundancy strategy would no longer be tightly tied to path
 location, but directories and files could themselves still inherit
 defaults from the filesystem and pool when appropriate (but could be
 individually handled when desirable).

The problem boils down to not having a way to express your intent that
works over NFS (where you're basically limited by POSIX) that you can
use from any platform (esp. ones where ZFS isn't installed).  If you
have some ideas, this is something we'd love to hear about.

 I've never understood why redundancy was a pool characteristic in ZFS
 - and the addition of 'ditto blocks' and now this new proposal (both
 of which introduce completely new forms of redundancy to compensate
 for the fact that pool-level redundancy doesn't satisfy some needs)
 just makes me more skeptical about it.

We have thought long and hard about this problem and even know how to
implement it (the name we've been using is Metaslab Grids, which isn't
terribly descriptive, or as Matt put it a bag o' disks).  There are
two main problems with it, though.  One is failures.  The problem is
that you want the set of disks implementing redundancy (mirror, RAID-Z,
etc.) to be spread across fault domains (controller, cable, fans, power
supplies, geographic sites) as much as possible.  There is no generic
mechanism to obtain this information and act upon it.  We could ask the
administrator to supply it somehow, but such a description takes effort,
is not easy, and prone to error.  That's why we have the model right now
where the administrator specifies how they want the disks spread out
across fault groups (vdevs).

The second problem comes back to accounting.  If you can specify, on a
per-file or per-directory basis, what kind of replication you want, how
do you answer the statvfs() question?  I think the recent discussions
on this list illustrate the complexity and passion on both sides of the
argument.

 (Not that I intend in any way to minimize the effort it might take to
 change that decision now.)

The effort is not actually that great.  All the hard problems we needed
to solve in order to implement this were basically solved when we did
the RAID-Z code.  As a matter of fact, you can see it in the on-disk
specification as well.  In the DVA, you'll notice an 8-bit field labeled
GRID.  These are the bits that would describe, on a per-block basis,
what kind of redundancy we used.


--Bill
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Re: Re: Proposal: multiple copies of user data

2006-09-12 Thread Celso
 On 12/09/06, Celso [EMAIL PROTECTED] wrote:
 
  I think it has already been said that in many
 peoples experience, when a disk fails, it completely
 fails. Especially on laptops. Of course ditto blocks
 wouldn't help you in this situation either!
 
 Exactly.
 
  I still think that silent data corruption is a
 valid concern, one that ditto blocks would solve. 
 Also, I am not thrilled about losing that much space
 for duplication of unneccessary data (caused by
 partitioning a disk in two).
 
 Well, you'd only be duplicating the data on the
 mirror. If you don't want to
 mirror the base OS, no one's saying you have to.
 

Yikes! that sounds like even more partitioning!

 For the sake of argument, let's assume:
 
 1. disk is expensive
 2. someone is keeping valuable files on a
 non-redundant zpool
 3. they can't scrape enough vdevs to make a redundant
 zpool
 (remembering you can build vdevs out of *flat
  files*)
 Even then, to my mind:
 
 to the user, the *file* (screenplay, movie of childs
 birth, civ3 saved
 game, etc.)
 is the logical entity to have a 'duplication level'
 attached to it,
 and the only person who can score that is the author
 of the file.
 
 This proposal says the filesystem creator/admin
 scores the filesystem.
 Your argument against unneccessary data duplication
 applies to all 'non-special'
 files in the 'special' filesystem. They're wasting
 space too.
 
 If the user wants to make sure the file is 'safer'
 than others, he can
 just make
 multiple copies. Either to a USB disk/flashdrive,
 cdrw, dvd, ftp
 server, whatever.
 
 The redundancy you're talking about is what you'd get
 from 'cp /foo/bar.jpg /foo/bar.jpg.ok', except it's
 hidden from the
 user and causing
 headaches for anyone trying to comprehend, port or
 extend the codebase in
 the future.

the proposed solution differs in one important aspect: it automatically detects 
data corruption.


  I also echo Darren's comments on zfs performing
 better when it has the whole disk.
 
 Me too, but a lot of laptop users dual-boot, which
 makes it a moot point.
 
  Hopefully we can agree that you lose nothing by
 adding this feature,
  even if you personally don't see a need for it.
 
 Sorry, I don't think we're going to agree on this one
 :)


No worries, that's cool. 
 All the best
 Dick.
 
 -- 
 Rasputin :: Jack of All Trades - Master of Nuns
 http://number9.hellooperator.net/
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discu
 ss
 

Celso
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: Re: Re: Proposal: multiple copies of user data

2006-09-12 Thread Chad Lewis


On Sep 12, 2006, at 4:39 PM, Celso wrote:


On 12/09/06, Celso [EMAIL PROTECTED] wrote:


I think it has already been said that in many

peoples experience, when a disk fails, it completely
fails. Especially on laptops. Of course ditto blocks
wouldn't help you in this situation either!

Exactly.


I still think that silent data corruption is a

valid concern, one that ditto blocks would solve. 
Also, I am not thrilled about losing that much space
for duplication of unneccessary data (caused by
partitioning a disk in two).

Well, you'd only be duplicating the data on the
mirror. If you don't want to
mirror the base OS, no one's saying you have to.



Yikes! that sounds like even more partitioning!



The redundancy you're talking about is what you'd get
from 'cp /foo/bar.jpg /foo/bar.jpg.ok', except it's
hidden from the
user and causing
headaches for anyone trying to comprehend, port or
extend the codebase in
the future.


the proposed solution differs in one important aspect: it  
automatically detects data corruption.





Detecting data corruption is a function of the ZFS checksumming  
feature. The proposed solution
has _nothing_ to do with detecting corruption. The difference is in  
what happens when/if such
bad data is detected. Without a duplicate copy, via some RAID level  
or the proposed ditto block

copies, the file is corrupted.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: Re: Re: Proposal: multiple copies of user data

2006-09-12 Thread Jeff Victor

Chad Lewis wrote:


On Sep 12, 2006, at 4:39 PM, Celso wrote:


the proposed solution differs in one important aspect: it automatically
detects data corruption.


Detecting data corruption is a function of the ZFS checksumming feature. The
proposed solution has _nothing_ to do with detecting corruption. The difference
is in what happens when/if such bad data is detected. Without a duplicate copy,
via some RAID level  or the proposed ditto block copies, the file is corrupted.



With a mirrored ZFS pool, what are the odds of losing all copies of the 
[meta]data, for N disks (where N = 1, 2, etc)?   I thought we understood this 
pretty well, and that the answer was extremely small.


--
Jeff VICTOR  Sun Microsystemsjeff.victor @ sun.com
OS AmbassadorSr. Technical Specialist
Solaris 10 Zones FAQ:http://www.opensolaris.org/os/community/zones/faq
--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: Re: Re: Proposal: multiple copies of user data

2006-09-12 Thread David Dyer-Bennet

On 9/12/06, Celso [EMAIL PROTECTED] wrote:


 Whether it's hard to understand is debatable, but
 this feature
 integrates very smoothly with the existing
 infrastructure and wouldn't
 cause any trouble when extending or porting ZFS.


OK, given this statement...


 Just for the record, these changes are pretty trivial
 to implement; less
 than 50 lines of code changed.

and this statement, I can't see any reasons not to include it. If the changes 
are easy to do, don't require anymore of the zfs team's valuable time, and 
don't hinder other things, I would plead with you to include them, as I think 
they are genuinely valuable and would make zfs not only the best enterprise 
level filesystem, but also the best filesystem for laptops/home computers.


While I'm not a big fan of this feature, if the work is that well
understood and that small, I have no objection to it.  (Boy that
sounds snotty; apologies, not what I intend here.  Those of you
reading this know how muich you care about my opinion, that's up to
you.)

I do pity the people who count on the ZFS redundancy to protect their
presentation on an important sales trip -- and then have their laptop
stolen.  But those people might well be the same ones who would have
*no* redundancy otherwise.  And nothing about this feature prevents
the paranoids like me from still making our backup CD and carrying it
separately.

I'm not prepared to go so far as to argue that it's bad to make them
feel safer :-).  At least, to make them feel safer *by making them
actually safer*.
--
David Dyer-Bennet, mailto:[EMAIL PROTECTED], http://www.dd-b.net/dd-b/
RKBA: http://www.dd-b.net/carry/
Pics: http://www.dd-b.net/dd-b/SnapshotAlbum/
Dragaera/Steven Brust: http://dragaera.info/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: Re: Re: Proposal: multiple copies of user data

2006-09-12 Thread Torrey McMahon

David Dyer-Bennet wrote:


While I'm not a big fan of this feature, if the work is that well
understood and that small, I have no objection to it.  (Boy that
sounds snotty; apologies, not what I intend here.  Those of you
reading this know how muich you care about my opinion, that's up to
you.)


One could make the argument that the feature could cause enough 
confusion to not warrant its inclusion. If I'm a typical user and I 
write a file to the filesystem where the admin set three copies but 
didn't tell me it might throw me into a tizzy trying to figure out why 
my quota is 3X where I expect it to be.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss