Re: [zfs-discuss] Deduplication - deleting the original

2009-12-08 Thread Michael Schuster

Colin Raven wrote:

What happens if, once dedup is on, I (or someone else with delete 
rights) open a photo management app containing that collection, and 
start deleting dupes - AND - happen to delete the original that all 
other references are pointing to. I know, I know, it doesn't matter - 
snapshots save the day - but in this instance that's not the point 
because I'm trying to properly understand the underlying dedup concept.


Logically, if you delete what everything is pointing at, all the 
pointers are now null values, they are - in effect - pointing at 
nothing...an empty hole.


I have the feeling the answer to this is; no they don't, there is no 
spoon (original) you're still OK. I suspect that, only because the 
people who thought this up couldn't possibly have missed such an 
obvious point. The problem I have is in trying to mentally frame this 
in such a way that I can subsequently explain it, if asked to do so 
(which I see coming for sure).


Help in understanding this would be hugely helpful - anyone?


I mentally compare deduplication to links to files (hard, not soft) - as I 
understand it, there is no original and copy; rather, every directory 
entry points to the data (the inode, in ufs-speak), and if one directory 
entry of several is deleted, only the reference count changes.
It's probably a little more complicated with dedup, but I think the 
parallel is valid.


HTH
Michael
--
Michael Schusterhttp://blogs.sun.com/recursion
Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Deduplication - deleting the original

2009-12-08 Thread Ed Jobs
On Tuesday 08 December 2009 14:00, Colin Raven wrote:
 Help in understanding this would be hugely helpful - anyone?
 
i am no pro in zfs, but to my understanding there is no original.
All the files have pointers to blocks on disk. Even if there is no ther file 
that shares the same block on the disk, there is a pointer to that. 
(of course if there are two or more files sharing the same block there would 
be more pointers)
So on any delete action all there is needed to be done is to delete the 
correct pointer. If there are no more pointers to that block, the block is 
therefore free.

But this is just how i understand it. Feel free to correct me.

-- 
Real programmers don't document. If it was hard to write, it should be hard to 
understand.


signature.asc
Description: This is a digitally signed message part.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Deduplication - deleting the original

2009-12-08 Thread Thomas Uebermeier

Colin,

I think you mix up the filesystem layer (where the individual files as
maintained) and the block layer, where actual data is stored.

The analogue of deduplication on the filesystem layer would be to create hard
links of the files, where deleting one file does not remove the other link.

Block layer deduplication is a black box, see it simply as a compression,
which works in the background.

Thomas

* Colin Raven [Tue Dec 08, 2009 at 01:00:54PM +0100]:

  In reading this blog post:
  [1]http://blogs.sun.com/bobn/entry/taking_zfs_deduplication_for_a
  a question came to mind.
  To understand the context of the question, consider the opening paragraph
  from the above post;



Here is my test case: I have 2 directories of photos, totaling about
90MB each. And here's the trick - they are almost complete duplicates of
each other. I downloaded all of the photos from the same camera on 2
different days. How many of you do that ? Yeah, me too.�



  OK, I consider myself in that category most certainly. Through just plain
  'ol sloppiness I must have multiple copies of some images. Sad self
  indictment...but anyway
  What happens if, once dedup is on, I (or someone else with delete rights)
  open a photo management app containing that collection, and start deleting
  dupes - AND - happen to delete the original that all other references are
  pointing to. I know, I know, it doesn't matter - snapshots save the day -
  but in this instance that's not the point because I'm trying to properly
  understand the underlying dedup concept.
  Logically, if you delete what everything is pointing at, all the pointers
  are now null values, they are - in effect - pointing at nothing...an empty
  hole.
  I have the feeling the answer to this is; no they don't, there is no
  spoon (original) you're still OK. I suspect that, only because the
  people who thought this up couldn't possibly have missed such an obvious
  point. The problem I have is in trying to mentally frame this in such a
  way that I can subsequently explain it, if asked to do so (which I see
  coming for sure).
  Help in understanding this would be hugely helpful - anyone?
  Regards  TIA,
  -Me

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Deduplication - deleting the original

2009-12-08 Thread Jeff Bonwick
 i am no pro in zfs, but to my understanding there is no original.

That is correct.  From a semantic perspective, there is no change
in behavior between dedup=off and dedup=on.  Even the accounting
remains the same: each reference to a block is charged to the dataset
making the reference.  The only place you see the effect of dedup
is at the pool level, which can now have more logical than physical
data.  You may also see a difference in performance, which can be
either positive or negative depending on a whole bunch of factors.

At the implementation level, all that's really happening with dedup
is that when you write a block whose contents are identical to an
existing block, instead of allocating new disk space we just increment
a reference count on the existing block.  When you free the block
(from the dataset's perspective), the storage pool decrements the
reference count, but the block remains allocated at the pool level.
When the reference count goes to zero, the storage pool frees the
block for real (returns it to the storage pool's free space map).

But, to reiterate, none of this is visible semantically.  The only
way you can even tell dedup is happening is to observe that the
total space used by all datasets exceeds the space allocated from
the pool -- i.e. that the pool's dedup ratio is greater than 1.0.

Jeff
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Deduplication - deleting the original

2009-12-08 Thread Colin Raven
On Tue, Dec 8, 2009 at 22:54, Jeff Bonwick jeff.bonw...@sun.com wrote:

  i am no pro in zfs, but to my understanding there is no original.

 That is correct.  From a semantic perspective, there is no change
 in behavior between dedup=off and dedup=on.  Even the accounting
 remains the same: each reference to a block is charged to the dataset
 making the reference.  The only place you see the effect of dedup
 is at the pool level, which can now have more logical than physical
 data.  You may also see a difference in performance, which can be
 either positive or negative depending on a whole bunch of factors.

 At the implementation level, all that's really happening with dedup
 is that when you write a block whose contents are identical to an
 existing block, instead of allocating new disk space we just increment
 a reference count on the existing block.  When you free the block
 (from the dataset's perspective), the storage pool decrements the
 reference count, but the block remains allocated at the pool level.
 When the reference count goes to zero, the storage pool frees the
 block for real (returns it to the storage pool's free space map).

 But, to reiterate, none of this is visible semantically.  The only
 way you can even tell dedup is happening is to observe that the
 total space used by all datasets exceeds the space allocated from
 the pool -- i.e. that the pool's dedup ratio is greater than 1.0.


Jeff, Thomas, Ed  Michael;

Thank you all for assisting in the education of a n00bie in this most
important ZFS feature. I *think* I have a better overall understanding now.

This list is a resource treasure trove! I hope I'm able to acquire
sufficient knowledge over time to eventually be able to contribute help to
other newcomers.

Regards  Thanks for all the help,
-Me
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss