Re: [zfs-discuss] zfs send and dedupe

2011-09-07 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Freddie Cash
> 
> Will be interesting to see whether or not -D works with ZFSv28 in FreeBSD 8-
> STABLE/9-BETA.  And whether or not "zfs send" is faster/better/easier/more
> reliable than rsyncing snapshots (which is what we do currently).

Holy crap, quit wasting time on the -D thing and/or dedup, and yes, start using 
zfs send | zfs receive instead of using rsync!  (Presuming you want to 
replicate your whole filesystem, and no exclusions and stuff like that.)  

rsync needs to walk the whole directory tree, and perform all sorts of 
comparison operations at a level above the filesystem to determine what changed 
and so forth...  And then needs to calcuate diffs...  ZFS instantly knows which 
blocks changed incrementally so it doesn't need to do any of that work.  ZFS 
just instantly starts streaming all the changed blocks, with magnificent 
efficiency.  Typically people abandoning rsync in favor of zfs send | receive 
will experience a couple orders of magnitude performance gain.  Depends on your 
data usage patterns, but that's typical.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send and dedupe

2011-09-07 Thread Lori Alt

On 09/ 7/11 02:20 PM, Daniel Carosone wrote:

On Wed, Sep 07, 2011 at 08:47:36AM -0600, Lori Alt wrote:

On 09/ 6/11 11:45 PM, Daniel Carosone wrote:

My understanding was that 'zfs send -D' would use the pool's DDT in
building its own, if present.

It does not use the pool's DDT, but it does use the SHA-256 checksums
that have already been calculated for on-disk dedup, thus speeding the
generation of the send stream.

Ah, thanks for the clarification.  Presumably the same is true if the
pool is using checksum=sha256, without dedup?


Yes, I think so.


Still a moot point for now :)

--
Dan.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send and dedupe

2011-09-07 Thread Daniel Carosone
On Wed, Sep 07, 2011 at 08:47:36AM -0600, Lori Alt wrote:
> On 09/ 6/11 11:45 PM, Daniel Carosone wrote:
>> My understanding was that 'zfs send -D' would use the pool's DDT in
>> building its own, if present.
> It does not use the pool's DDT, but it does use the SHA-256 checksums  
> that have already been calculated for on-disk dedup, thus speeding the  
> generation of the send stream.

Ah, thanks for the clarification.  Presumably the same is true if the
pool is using checksum=sha256, without dedup? 

Still a moot point for now :)

--
Dan.


pgpGicq3S3F7G.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send and dedupe

2011-09-07 Thread Freddie Cash
Thanks for the replies everyone.  That was along the lines of what I was
thinking (-D is a "win" for network usage savings, if it works) but wanted
to double-check before I started playing with out new boxes.

Will be interesting to see whether or not -D works with ZFSv28 in FreeBSD
8-STABLE/9-BETA.  And whether or not "zfs send" is faster/better/easier/more
reliable than rsyncing snapshots (which is what we do currently).

Thanks for the info.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send and dedupe

2011-09-07 Thread Lori Alt

On 09/ 6/11 11:45 PM, Daniel Carosone wrote:

On Tue, Sep 06, 2011 at 10:05:54PM -0700, Richard Elling wrote:

On Sep 6, 2011, at 9:01 PM, Freddie Cash wrote:


For example, does 'zfs send -D' use the same DDT as the pool?

No.

My understanding was that 'zfs send -D' would use the pool's DDT in
building its own, if present.
It does not use the pool's DDT, but it does use the SHA-256 checksums 
that have already been calculated for on-disk dedup, thus speeding the 
generation of the send stream.




If blocks were known by the filesystem
to be duplicate, it would use that knowledge to skip some work seeding
its own ddt and stream back-references. This doesn't change the stream
contents vs what it would have generated without these hints, so "No"
still works as a short answer :)

That understanding was based on discussions and blog posts at the
time, not looking at code. At least in theory, it should help avoid
reading and checksumming extra data blocks if this knowledge can be
used, so less work regardless of measurable impact on send throughput.
(It's more about diminished impact to other concurrent activities)

The point has mostly been moot in practice, though, because I've found
"zfs send -D" just plain doesn't work and often generates invalid
streams, as you note. Good to know there are fixes.

--
Dan.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send and dedupe

2011-09-07 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Richard Elling
> 
> For example, does 'zfs send -D' use the same DDT as the pool?
> 
> No.
> 
> 
> Or does it require more memory for it's own DDT, thus impacting
> performance of both?
> 
> Yes, no.

How can this be?  
If zfs send -D does not use the same DDT as the pool, then it must require
memory for its own DDT.  But Richard, the second half of your answer seems
to contradict this.  Perhaps you are denying that the extra memory usage
impacts performance of the system?


> If you have a deduped pool on both ends of the send, does -D make any
> difference?
> 
> If neither pool is deduped, does -D make a difference?

Yes.  If the originating pool is dedup'd on disk, then it's just dedup'd on
disk.  And if the recipient pool is dedup'd on disk, then it's just dedup'd
on disk.  In either case, traditionally the data would not be dedup'd in
transit (zfs send.)

zfs send -D only causes the data to be dedup'd in the data stream from the
sender to the receiver.  This presumably saves network bandwidth and
accelerates the network traffic.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send and dedupe

2011-09-06 Thread Daniel Carosone
On Tue, Sep 06, 2011 at 10:05:54PM -0700, Richard Elling wrote:
> On Sep 6, 2011, at 9:01 PM, Freddie Cash wrote:
> 
> > For example, does 'zfs send -D' use the same DDT as the pool?
> 
> No.

My understanding was that 'zfs send -D' would use the pool's DDT in 
building its own, if present. If blocks were known by the filesystem
to be duplicate, it would use that knowledge to skip some work seeding
its own ddt and stream back-references. This doesn't change the stream
contents vs what it would have generated without these hints, so "No"
still works as a short answer :) 

That understanding was based on discussions and blog posts at the
time, not looking at code. At least in theory, it should help avoid
reading and checksumming extra data blocks if this knowledge can be
used, so less work regardless of measurable impact on send throughput.
(It's more about diminished impact to other concurrent activities)

The point has mostly been moot in practice, though, because I've found
"zfs send -D" just plain doesn't work and often generates invalid
streams, as you note. Good to know there are fixes.

--
Dan.

pgpjZ1t9mVCs0.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send and dedupe

2011-09-06 Thread Richard Elling
On Sep 6, 2011, at 9:01 PM, Freddie Cash wrote:
> Just curious if anyone has looked into the relationship between zpool dedupe, 
> zfs zend dedupe, memory use, and network throughput.
> 

Yes.

> For example, does 'zfs send -D' use the same DDT as the pool?
> 

No.

> Or does it require more memory for it's own DDT, thus impacting performance 
> of both?
> 

Yes, no.

> If you have a deduped pool on both ends of the send, does -D make any 
> difference?
> 

Yes, if the data is deduplicable.

> If neither pool is deduped, does -D make a difference?
> 

Yes, if the data is deduplicable.

> We're waiting on a replacement backplane for our newest zfs-based storage 
> box, so won't be able to look into this ourselves until next week at the 
> earliest. Thought i'd check if anyone else has already done some comparisons 
> or benchmarks.
> 

I'm not aware of any benchmarks, and I'd be surprised if they could be applied 
to real-world
cases. zfs send deduplication is very, very, very dependent on the data being 
sent. It is also
dependent on the release, since it is broken in many OpenSolaris and derived 
builds. Fixes
have recently been submitted into the illumos source tree. Recent Nexenta 
distributions also
have the fixes.
 -- richard


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs send and dedupe

2011-09-06 Thread Freddie Cash
Just curious if anyone has looked into the relationship between zpool
dedupe, zfs zend dedupe, memory use, and network throughput.

For example, does 'zfs send -D' use the same DDT as the pool? Or does it
require more memory for it's own DDT, thus impacting performance of both?

If you have a deduped pool on both ends of the send, does -D make any
difference?

If neither pool is deduped, does -D make a difference?

We're waiting on a replacement backplane for our newest zfs-based storage
box, so won't be able to look into this ourselves until next week at the
earliest. Thought i'd check if anyone else has already done some comparisons
or benchmarks.

Cheers,
Freddie
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss