Re: [zfs-discuss] ZFS Distro Advice
On Tue, March 5, 2013 11:17, Russ Poyner wrote: > Your idea to use zfs diff to limit the need to stat the entire > filesystem tree intrigues me. My current rsync backups are normally > limited by this very factor. It takes longer to walk the filesystem tree > than it does to transfer the new data. > > Would you be willing to provide an example of what you mean when you say > parse/feed the ouput of zfs diff to rsync? Don't have anything readily available, or a ZFS system handy to hack something up. The output of "zfs diff" is roughly: M /myfiles/ M /myfiles/link_to_me (+1) R /myfiles/rename_me -> /myfiles/renamed - /myfiles/delete_me + /myfiles/new_file Take the second column and use that as the list of file to check. Solaris' zfs(1M) has an "-F" option which would output something like: M / /myfiles/ M F /myfiles/link_to_me (+1) R /myfiles/rename_me -> /myfiles/renamed - F /myfiles/delete_me + F /myfiles/new_file + | /myfiles/new_pipe So the second column now has a type, and the path is pushed over to the third column. This way you can simply choose file ("F") and tell rsync to use check those. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
On 3/5/2013 10:27 AM, Bob Friesenhahn wrote: On Tue, 5 Mar 2013, David Magda wrote: It's also possible to reduce the amount that rsync has to walk the entire file tree. Most folks simply do a "rsync --options /my/source/ /the/dest/", but if you use "zfs diff", and parse/feed the output of that to rsync, then the amount of thrashing can probably be minimized. Especially useful for file hierarchies that very many individual files, so you don't have to stat() every single one. Zfs diff only works for zfs filesystems. If one is using zfs filesystems then rsync may not be the best option. In the real world, data may be sourced from many types of systems and filesystems. Bob Bob, Good point. Clearly this wouldn't work for my current linux fileserver. I'm building a replacement that will run FreeBSD 9.1 with a zfs storage pool. My backups are to a thumper running solaris 10 and zfs in another department. I have an arm's-length collaboration with the department that runs the thumper, which likely precludes a direct zfs send. Rsync has allowed us to transfer data without getting too deep into each others' system administration. I run an rsync daemon with read only access to my filesystem that accepts connections from the thumper. They serve the backups to me via a read-only nfs export. The only problem has been the iops load generated by my users' millions of small files. That's why the zfs diff idea excited me, but perhaps I'm missing some simpler approach. Russ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
On Tue, 5 Mar 2013, David Magda wrote: It's also possible to reduce the amount that rsync has to walk the entire file tree. Most folks simply do a "rsync --options /my/source/ /the/dest/", but if you use "zfs diff", and parse/feed the output of that to rsync, then the amount of thrashing can probably be minimized. Especially useful for file hierarchies that very many individual files, so you don't have to stat() every single one. Zfs diff only works for zfs filesystems. If one is using zfs filesystems then rsync may not be the best option. In the real world, data may be sourced from many types of systems and filesystems. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
On 3/5/2013 9:40 AM, David Magda wrote: On Tue, March 5, 2013 10:02, Bob Friesenhahn wrote: Rsync does need to read files on the destination filesystem to see if they have changed. If the system has sufficient RAM (and/or L2ARC) then files may still be cached from the previous day's run. In most cases only a small subset of the total files are updated (at least on my systems) so the caching requirements are small. Files updated on one day are more likely to be the ones updated on subsequent days. It's also possible to reduce the amount that rsync has to walk the entire file tree. Most folks simply do a "rsync --options /my/source/ /the/dest/", but if you use "zfs diff", and parse/feed the output of that to rsync, then the amount of thrashing can probably be minimized. Especially useful for file hierarchies that very many individual files, so you don't have to stat() every single one. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss David, Your idea to use zfs diff to limit the need to stat the entire filesystem tree intrigues me. My current rsync backups are normally limited by this very factor. It takes longer to walk the filesystem tree than it does to transfer the new data. Would you be willing to provide an example of what you mean when you say parse/feed the ouput of zfs diff to rsync? Russ Poyner ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
On Tue, March 5, 2013 10:02, Bob Friesenhahn wrote: > Rsync does need to read files on the destination filesystem to see if > they have changed. If the system has sufficient RAM (and/or L2ARC) > then files may still be cached from the previous day's run. In most > cases only a small subset of the total files are updated (at least on > my systems) so the caching requirements are small. Files updated on > one day are more likely to be the ones updated on subsequent days. It's also possible to reduce the amount that rsync has to walk the entire file tree. Most folks simply do a "rsync --options /my/source/ /the/dest/", but if you use "zfs diff", and parse/feed the output of that to rsync, then the amount of thrashing can probably be minimized. Especially useful for file hierarchies that very many individual files, so you don't have to stat() every single one. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
On Mon, 4 Mar 2013, Matthew Ahrens wrote: Magic rsync options used: -a --inplace --no-whole-file --delete-excluded This causes rsync to overwrite the file blocks in place rather than writing to a new temporary file first. As a result, zfs COW produces primitive "deduplication" of at least the unchanged blocks (by writing nothing) while writing new COW blocks for the changed blocks. If I understand your use case correctly (the application overwrites some blocks with the same exact contents), ZFS will ignore these "no-op" writes only on recent Open ZFS (illumos / FreeBSD / Linux) builds with checksum=sha256 and compression!=off. AFAIK, Solaris ZFS will COW the blocks even if their content is identical to what's already there, causing the snapshots to diverge. With these rsync options, rsync will only overwrite a "block" if the contents of the block has changed. Rsync's notion of a block is different than zfs so there is not a perfect overlap. Rsync does need to read files on the destination filesystem to see if they have changed. If the system has sufficient RAM (and/or L2ARC) then files may still be cached from the previous day's run. In most cases only a small subset of the total files are updated (at least on my systems) so the caching requirements are small. Files updated on one day are more likely to be the ones updated on subsequent days. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Distro Advice
> >> We do the same for all of our "legacy" operating system backups. > Take > >> a snapshot then do an rsync and an excellent way of maintaining > >> incremental backups for those. > > > > > > Magic rsync options used: > > > > -a --inplace --no-whole-file --delete-excluded > > > > This causes rsync to overwrite the file blocks in place rather than > > writing to a new temporary file first. As a result, zfs COW produces > > primitive "deduplication" of at least the unchanged blocks (by > writing > > nothing) while writing new COW blocks for the changed blocks. > > If I understand your use case correctly (the application overwrites > some blocks with the same exact contents), ZFS will ignore these "no- I think he meant to rely on rsync here to do in-place updates of files and only for changed blocks with the above parameters (by using rsync's own delta mechanism). So if you have a file a and only one block changed rsync will overwrite on destination only that single block. > op" writes only on recent Open ZFS (illumos / FreeBSD / Linux) builds > with checksum=sha256 and compression!=off. AFAIK, Solaris ZFS will COW > the blocks even if their content is identical to what's already there, > causing the snapshots to diverge. > > See https://www.illumos.org/issues/3236 for details. > This is interesting. I didn't know about it. Is there an option similar to verify=on in dedup or does it just assume that "checksum is your data"? -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss