Re: [zfs-discuss] ZFS Distro Advice

2013-03-05 Thread David Magda
On Tue, March 5, 2013 11:17, Russ Poyner wrote:
> Your idea to use zfs diff to limit the need to stat the entire
> filesystem tree intrigues me. My current rsync backups are normally
> limited by this very factor. It takes longer to walk the filesystem tree
> than it does to transfer the new data.
>
> Would you be willing to provide an example of what you mean when you say
> parse/feed the ouput of zfs diff to rsync?

Don't have anything readily available, or a ZFS system handy to hack
something up. The output of "zfs diff" is roughly:

  M   /myfiles/
  M   /myfiles/link_to_me   (+1)
  R   /myfiles/rename_me -> /myfiles/renamed
  -   /myfiles/delete_me
  +   /myfiles/new_file

Take the second column and use that as the list of file to check. Solaris'
zfs(1M) has an "-F" option which would output something like:

   M   /   /myfiles/
   M   F   /myfiles/link_to_me  (+1)
   R   /myfiles/rename_me -> /myfiles/renamed
   -   F   /myfiles/delete_me
   +   F   /myfiles/new_file
   +   |   /myfiles/new_pipe

So the second column now has a type, and the path is pushed over to the
third column. This way you can simply choose file ("F") and tell rsync to
use check those.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-03-05 Thread Russ Poyner

On 3/5/2013 10:27 AM, Bob Friesenhahn wrote:

On Tue, 5 Mar 2013, David Magda wrote:
It's also possible to reduce the amount that rsync has to walk the 
entire

file tree.

Most folks simply do a "rsync --options /my/source/ /the/dest/", but if
you use "zfs diff", and parse/feed the output of that to rsync, then the
amount of thrashing can probably be minimized. Especially useful for 
file

hierarchies that very many individual files, so you don't have to stat()
every single one.


Zfs diff only works for zfs filesystems.  If one is using zfs 
filesystems then rsync may not be the best option.  In the real world, 
data may be sourced from many types of systems and filesystems.


Bob

Bob,

Good point. Clearly this wouldn't work for my current linux fileserver. 
I'm building a replacement that will run FreeBSD 9.1 with a zfs storage 
pool. My backups are to a thumper running solaris 10 and zfs in another 
department. I have an arm's-length collaboration with the department 
that runs the thumper, which likely precludes a direct zfs send.


Rsync has allowed us to transfer data without getting too deep into each 
others' system administration. I run an rsync daemon with read only 
access to my filesystem that accepts connections from the thumper. They 
serve the backups to me via a read-only nfs export. The only problem has 
been the iops load generated by my users' millions of small files. 
That's why the zfs diff idea excited me, but perhaps I'm missing some 
simpler approach.


Russ
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-03-05 Thread Bob Friesenhahn

On Tue, 5 Mar 2013, David Magda wrote:

It's also possible to reduce the amount that rsync has to walk the entire
file tree.

Most folks simply do a "rsync --options /my/source/ /the/dest/", but if
you use "zfs diff", and parse/feed the output of that to rsync, then the
amount of thrashing can probably be minimized. Especially useful for file
hierarchies that very many individual files, so you don't have to stat()
every single one.


Zfs diff only works for zfs filesystems.  If one is using zfs 
filesystems then rsync may not be the best option.  In the real world, 
data may be sourced from many types of systems and filesystems.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-03-05 Thread Russ Poyner

On 3/5/2013 9:40 AM, David Magda wrote:

On Tue, March 5, 2013 10:02, Bob Friesenhahn wrote:


Rsync does need to read files on the destination filesystem to see if
they have changed.  If the system has sufficient RAM (and/or L2ARC)
then files may still be cached from the previous day's run.  In most
cases only a small subset of the total files are updated (at least on
my systems) so the caching requirements are small.  Files updated on
one day are more likely to be the ones updated on subsequent days.

It's also possible to reduce the amount that rsync has to walk the entire
file tree.

Most folks simply do a "rsync --options /my/source/ /the/dest/", but if
you use "zfs diff", and parse/feed the output of that to rsync, then the
amount of thrashing can probably be minimized. Especially useful for file
hierarchies that very many individual files, so you don't have to stat()
every single one.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


David,

Your idea to use zfs diff to limit the need to stat the entire 
filesystem tree intrigues me. My current rsync backups are normally 
limited by this very factor. It takes longer to walk the filesystem tree 
than it does to transfer the new data.


Would you be willing to provide an example of what you mean when you say 
parse/feed the ouput of zfs diff to rsync?


Russ Poyner
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-03-05 Thread David Magda
On Tue, March 5, 2013 10:02, Bob Friesenhahn wrote:

> Rsync does need to read files on the destination filesystem to see if
> they have changed.  If the system has sufficient RAM (and/or L2ARC)
> then files may still be cached from the previous day's run.  In most
> cases only a small subset of the total files are updated (at least on
> my systems) so the caching requirements are small.  Files updated on
> one day are more likely to be the ones updated on subsequent days.

It's also possible to reduce the amount that rsync has to walk the entire
file tree.

Most folks simply do a "rsync --options /my/source/ /the/dest/", but if
you use "zfs diff", and parse/feed the output of that to rsync, then the
amount of thrashing can probably be minimized. Especially useful for file
hierarchies that very many individual files, so you don't have to stat()
every single one.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-03-05 Thread Bob Friesenhahn

On Mon, 4 Mar 2013, Matthew Ahrens wrote:


Magic rsync options used:

  -a --inplace --no-whole-file --delete-excluded

This causes rsync to overwrite the file blocks in place rather than writing
to a new temporary file first.  As a result, zfs COW produces primitive
"deduplication" of at least the unchanged blocks (by writing nothing) while
writing new COW blocks for the changed blocks.


If I understand your use case correctly (the application overwrites
some blocks with the same exact contents), ZFS will ignore these
"no-op" writes only on recent Open ZFS (illumos / FreeBSD / Linux)
builds with checksum=sha256 and compression!=off.  AFAIK, Solaris ZFS
will COW the blocks even if their content is identical to what's
already there, causing the snapshots to diverge.


With these rsync options, rsync will only overwrite a "block" if the 
contents of the block has changed.  Rsync's notion of a block is 
different than zfs so there is not a perfect overlap.


Rsync does need to read files on the destination filesystem to see if 
they have changed.  If the system has sufficient RAM (and/or L2ARC) 
then files may still be cached from the previous day's run.  In most 
cases only a small subset of the total files are updated (at least on 
my systems) so the caching requirements are small.  Files updated on 
one day are more likely to be the ones updated on subsequent days.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-03-05 Thread Robert Milkowski
> >> We do the same for all of our "legacy" operating system backups.
> Take
> >> a snapshot then do an rsync and an excellent way of maintaining
> >> incremental backups for those.
> >
> >
> > Magic rsync options used:
> >
> >   -a --inplace --no-whole-file --delete-excluded
> >
> > This causes rsync to overwrite the file blocks in place rather than
> > writing to a new temporary file first.  As a result, zfs COW produces
> > primitive "deduplication" of at least the unchanged blocks (by
> writing
> > nothing) while writing new COW blocks for the changed blocks.
> 
> If I understand your use case correctly (the application overwrites
> some blocks with the same exact contents), ZFS will ignore these "no-

I think he meant to rely on rsync here to do in-place updates of files and
only for changed blocks with the above parameters (by using rsync's own
delta mechanism). So if you have a file a and only one block changed rsync
will overwrite on destination only that single block.


> op" writes only on recent Open ZFS (illumos / FreeBSD / Linux) builds
> with checksum=sha256 and compression!=off.  AFAIK, Solaris ZFS will COW
> the blocks even if their content is identical to what's already there,
> causing the snapshots to diverge.
> 
> See https://www.illumos.org/issues/3236 for details.
> 

This is interesting. I didn't know about it.
Is there an option similar to verify=on in dedup or does it just assume that
"checksum is your data"?

-- 
Robert Milkowski
http://milek.blogspot.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss