Re: [zfs-discuss] zfs send and ARC

2010-03-26 Thread Edward Ned Harvey
 In the Thoughts on ZFS Pool Backup Strategies thread it was stated
 that zfs send, sends uncompress data and uses the ARC.
 
 If zfs send sends uncompress data which has already been compress
 this is not very efficient, and it would be *nice* to see it send the
 original compress data. (or an option to do it)

You've got 2 questions in your post.  The one above first ...

It's true that zfs send sends uncompressed data.  So I've heard.  I haven't 
tested it personally.

I seem to remember there's some work to improve this, but not available yet.  
Because it was easier to implement the uncompressed send, and that already is 
super-fast compared to all the alternatives.


 I thought I would ask a true or false type questions mainly for
 curiosity sake.
 
 If zfs send uses standard ARC cache (when something is not already in
 the ARC) I would expect this to hurt (to some degree??) the performance
 of the system. (ie I assume it has the effect of replacing
 current/useful data in the cache with not very useful/old data

And this is a separate question.

I can't say first-hand what ZFS does, but I have an educated guess.  I would 
say, for every block the zfs send needs to read ... if the block is in ARC or 
L2ARC, then it won't fetch again from disk.  But it is not obliterating the ARC 
or L2ARC with old data.  Because it's smart enough to work at a lower level 
than a user-space process, and tell the kernel (or whatever) something like 
I'm only reading this block once; don't bother caching it for my sake.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send and ARC

2010-03-26 Thread David Dyer-Bennet

On Fri, March 26, 2010 07:06, Edward Ned Harvey wrote:
 In the Thoughts on ZFS Pool Backup Strategies thread it was stated
 that zfs send, sends uncompress data and uses the ARC.

 If zfs send sends uncompress data which has already been compress
 this is not very efficient, and it would be *nice* to see it send the
 original compress data. (or an option to do it)

 You've got 2 questions in your post.  The one above first ...

 It's true that zfs send sends uncompressed data.  So I've heard.  I
 haven't tested it personally.

 I seem to remember there's some work to improve this, but not available
 yet.  Because it was easier to implement the uncompressed send, and that
 already is super-fast compared to all the alternatives.

I don't know that it makes sense to.  There are lots of existing filter
packages that do compression; so if you want compression, just put them in
your pipeline.  That way you're not limited by what zfs send has
implemented, either.  When they implement bzip98 with a new compression
technology breakthrough, you can just use it :-) .

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send and ARC

2010-03-26 Thread David Magda
On Fri, March 26, 2010 09:46, David Dyer-Bennet wrote:

 I don't know that it makes sense to.  There are lots of existing filter
 packages that do compression; so if you want compression, just put them in
 your pipeline.  That way you're not limited by what zfs send has
 implemented, either.  When they implement bzip98 with a new compression
 technology breakthrough, you can just use it :-) .

Actually a better example may be using parallel implementations of popular
algorithms:

http://www.zlib.net/pigz/
http://www.google.com/search?q=parallel+bzip

Given the amount of cores we have nowadays (especially the Niagara-based
CPUs), might as well use them. There are also better algorithms out there
(some of which assume parallelism):

http://en.wikipedia.org/wiki/Xz
http://en.wikipedia.org/wiki/7z

If you're using OpenSSH, there are also some third-party patches that may
help in performance:

http://www.psc.edu/networking/projects/hpn-ssh/

However, if the data is already compressed (and/or deduped), there's no
sense in doing it again. If ZFS does have to go to disk, might as well
send the data as-is.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs send and ARC

2010-03-25 Thread Damon Atkins
In the Thoughts on ZFS Pool Backup Strategies thread it was stated that zfs 
send, sends uncompress data and uses the ARC.

If zfs send sends uncompress data which has already been compress this is not 
very efficient, and it would be *nice* to see it send the original compress 
data. (or an option to do it)

I thought I would ask a true or false type questions mainly for curiosity sake.

If zfs send uses standard ARC cache (when something is not already in the 
ARC) I would expect this to hurt (to some degree??) the performance of the 
system. (ie I assume it has the effect of replacing current/useful data in the 
cache with not very useful/old data depending on how large the ZFS send is)


If above true,  zfs send and “zfs backup” (if it the cmd existed to backup and 
restore a file or set of files with all ZFS attributes) would improve the 
performance of normal read/write by avoiding the ARC cache (or if easier to 
implement having its own private ARC cache).

Or does it use the same sort of code, as setting “primarycache=none” on a file 
system.

Has anyone monitored ARC hit rates while doing a large zfs send?

Cheers
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send and ARC

2010-03-25 Thread Richard Elling
On Mar 25, 2010, at 6:13 AM, Damon Atkins wrote:
 In the Thoughts on ZFS Pool Backup Strategies thread it was stated that zfs 
 send, sends uncompress data and uses the ARC.
 
 If zfs send sends uncompress data which has already been compress this is 
 not very efficient, and it would be *nice* to see it send the original 
 compress data. (or an option to do it)
 
 I thought I would ask a true or false type questions mainly for curiosity 
 sake.
 
 If zfs send uses standard ARC cache (when something is not already in the 
 ARC) I would expect this to hurt (to some degree??) the performance of the 
 system. (ie I assume it has the effect of replacing current/useful data in 
 the cache with not very useful/old data depending on how large the ZFS send 
 is)

If you restrict answers to true/false then the answer is false :-)
Actually, the answer is mostly false. The ARC is divided into a most 
frequently used cache and a most recently used cache. The send
data should stick to the most recently used side.

 If above true,  zfs send and “zfs backup” (if it the cmd existed to backup 
 and restore a file or set of files with all ZFS attributes) would improve the 
 performance of normal read/write by avoiding the ARC cache (or if easier to 
 implement having its own private ARC cache).

The zio pipeline can, in theory, be tapped between the checksum and
decompression side, but I think you will find that this defeats both piped
compression and receive compression.

 
 Or does it use the same sort of code, as setting “primarycache=none” on a 
 file system.
 
 Has anyone monitored ARC hit rates while doing a large zfs send?

Yes.  I see very good ARC hit rates when I send from a high transaction
system. This is a good thing because recently written data is likely to be
in the ARC.
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com 





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send and ARC

2010-03-25 Thread Darren J Moffat
Wither it is efficient or not to send the compressed or uncompressed 
data depends on a lot of factors.


If the data is already in the ARC for some other reason then it is 
likely much more efficient to use that because sending the compressed 
blocks involves doing IO to disk.  Reading the version from the in 
memory ARC does not.


If the data is in the L2ARC that is still better than going out to the 
main pool disks to get the compressed version.


Reading from disk is always slower than reading from memory.

Depending on what your working set of data in the ARC is and the size of 
the dataset you are sending it is possible that the 'zfs send' will 
cause data that was in the ARC to be evicted to make room for the blocks 
that 'zfs send' needs.   This is a perfect use case for having a large 
L2ARC if you can't fit your working set and the blocks for the 'zfs 
send' into the ARC.


If you are using incremental 'zfs send' streams the chances of you 
thrashing the ARC are probably reduced, particularly if you do them 
frequently enough so that they aren't too big.


I know people have monitored the ARC hit rates when doing large zfs 
sends.  Using the DTrace Analytics in an SS7000 makes this very easy.


It really comes down to the size of your working set in the ARC, the 
size of your L2ARC and your pattern of data access all that combined 
with the volumen of data you are 'zfs send'ing.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send and ARC

2010-03-25 Thread Nicolas Williams
On Thu, Mar 25, 2010 at 04:23:38PM +, Darren J Moffat wrote:
 If the data is in the L2ARC that is still better than going out to
 the main pool disks to get the compressed version.

advocate customer='devil'

Well, one could just compress it...  If you'd otherwise put compression
in the ssh pipe (or elsewhere) then you could stop doing that.

/advocate customer='devil'

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss