Re: Fwd: rsync --link-dest and --files-from lead by a change list from some file system audit tool (Was: Re: cut-off time for rsync ?)

2015-07-16 Thread Andrew Gideon
On Mon, 13 Jul 2015 17:38:35 -0400, Selva Nair wrote: As with any dedup solution, performance does take a hit and its often not worth it unless you have a lot of duplication in the data. This is so only in some volumes in our case, but it appears that zfs permits this to be enabled/disabled

Re: Fwd: rsync --link-dest and --files-from lead by a change list from some file system audit tool (Was: Re: cut-off time for rsync ?)

2015-07-16 Thread Ken Chase
yeah, i read somewhere that zfs DOES have separate tuning for metadata and data cache, but i need to read up on that more. as for heavy block duplication: daily backups of the whole system = alot of dupe. /kc On Thu, Jul 16, 2015 at 05:42:32PM +, Andrew Gideon said: On Mon, 13 Jul

Re: rsync --link-dest and --files-from lead by a change list from some file system audit tool (Was: Re: cut-off time for rsync ?)

2015-07-16 Thread Andrew Gideon
On Tue, 14 Jul 2015 08:59:25 +0200, Paul Slootman wrote: btrfs has support for this: you make a backup, then create a btrfs snapshot of the filesystem (or directory), then the next time you make a new backup with rsync, use --inplace so that just changed parts of the file are written to the

Re: rsync --link-dest and --files-from lead by a change list from some file system audit tool (Was: Re: cut-off time for rsync ?)

2015-07-16 Thread Simon Hobson
Andrew Gideon c182driv...@gideon.org wrote: btrfs has support for this: you make a backup, then create a btrfs snapshot of the filesystem (or directory), then the next time you make a new backup with rsync, use --inplace so that just changed parts of the file are written to the same blocks

Re: rsync --link-dest and --files-from lead by a change list from some file system audit tool (Was: Re: cut-off time for rsync ?)

2015-07-14 Thread Paul Slootman
On Mon 13 Jul 2015, Andrew Gideon wrote: On the other hand, I do confess that I am sometimes miffed at the waste involved in a small change to a very large file. Rsync is smart about moving minimal data, but it still stores an entire new copy of the file. What's needed is a file system

Re: rsync --link-dest and --files-from lead by a change list from some file system audit tool (Was: Re: cut-off time for rsync ?)

2015-07-14 Thread Ken Chase
And what's performance like? I've heard lots of COW systems performance drops through the floor when there's many snapshots. /kc On Tue, Jul 14, 2015 at 08:59:25AM +0200, Paul Slootman said: On Mon 13 Jul 2015, Andrew Gideon wrote: On the other hand, I do confess that I am sometimes

Re: rsync --link-dest and --files-from lead by a change list from some file system audit tool (Was: Re: cut-off time for rsync ?)

2015-07-14 Thread Simon Hobson
Ken Chase rsync-list-m...@sizone.org wrote: And what's performance like? I've heard lots of COW systems performance drops through the floor when there's many snapshots. For BTRFS I'd suspect the performance penalty to be fairly small. Snapshots can be done in different ways, and the way BTRFS

rsync --link-dest and --files-from lead by a change list from some file system audit tool (Was: Re: cut-off time for rsync ?)

2015-07-13 Thread Andrew Gideon
On Mon, 13 Jul 2015 02:19:23 +, Andrew Gideon wrote: Look at tools like inotifywait, auditd, or kfsmd to see what's easily available to you and what best fits your needs. [Though I'd also be surprised if nobody has fed audit information into rsync before; your need doesn't seem all that

Re: rsync --link-dest and --files-from lead by a change list from some file system audit tool (Was: Re: cut-off time for rsync ?)

2015-07-13 Thread Simon Hobson
Andrew Gideon c182driv...@gideon.org wrote: These both bring me to the idea of using some file system auditing mechanism to drive - perhaps with an --include-from or --files-from - what rsync moves. Where I get stuck is that I cannot envision how I can provide rsync with a limited list

Re: rsync --link-dest and --files-from lead by a change list from some file system audit tool (Was: Re: cut-off time for rsync ?)

2015-07-13 Thread Ken Chase
inotifywatch or equiv, there's FSM stuff (filesystem monitor) as well. constantData had a product we used years ago - a kernel module that dumped out a list of any changed files out some /proc or /dev/* device and they had a whole toolset that ate the list (into some db) and played it out as it

Re: rsync --link-dest and --files-from lead by a change list from some file system audit tool (Was: Re: cut-off time for rsync ?)

2015-07-13 Thread Simon Hobson
Andrew Gideon c182driv...@gideon.org wrote: However, you've made be a little apprehensive about storebackup. I like the lack of a need for a restore tool. This permits all the standard UNIX tools to be applied to whatever I might want to do over the backup, which is often *very*

Fwd: rsync --link-dest and --files-from lead by a change list from some file system audit tool (Was: Re: cut-off time for rsync ?)

2015-07-13 Thread Selva Nair
On Mon, Jul 13, 2015 at 5:19 PM, Simon Hobson li...@thehobsons.co.uk wrote: What's needed is a file system that can do what hard links do, but at the file page level. I imagine that this would work using the same Copy On Write logic used in managing memory pages after a fork(). Well some

Re: cut-off time for rsync ?

2015-07-12 Thread Andrew Gideon
On Thu, 02 Jul 2015 20:57:06 +1200, Mark wrote: You could use find to build a filter to use with rsync, then update the filter every few days if it takes too long to create. If you're going to do something of that sort, you might want instead to consider truly tracking changes. This catches

Re: cut-off time for rsync ?

2015-07-03 Thread Simon Hobson
Ken Chase rsync-list-m...@sizone.org wrote: You have NO IDEA how long it takes to scan 100M files on a 7200 rpm disk. Actually I do have some idea ! Additionally, I dont know if linux (or freebsd or any unix) can be told to cache metadata more aggressively than data That had gone through

Re: cut-off time for rsync ?

2015-07-02 Thread Dirk van Deun
What is taking time, scanning inodes on the destination, or recopying the entire backup because of either source read speed, target write speed or a slow interconnect between them? It takes hours to traverse all these directories with loads of small files on the backup server. That is the

Re: cut-off time for rsync ?

2015-07-02 Thread Mark
You could use find to build a filter to use with rsync, then update the filter every few days if it takes too long to create. I have used a script to build a filter on the source server to exclude anything over 5 days old, invoked when the sync starts, but it only parses around 2000 files per

Re: cut-off time for rsync ?

2015-07-02 Thread Ken Chase
Yes if rsync could keep a 'last state file' that'd be great, which would require the target be unchanged by any other process/usage - this is however the case with many of our uses here - as a backup only target. Then it could just load the target statefile, and only scan the source for changes

Re: cut-off time for rsync ?

2015-07-02 Thread Ken Chase
On Wed, Jul 01, 2015 at 02:05:50PM +0100, Simon Hobson said: As I read this, the default is to look at the file size/timestamp and if they match then do nothing as they are assumed to be identical. So unless you have specified this, then files which have already been copied should be

Re: cut-off time for rsync ?

2015-07-01 Thread Ken Chase
What is taking time, scanning inodes on the destination, or recopying the entire backup because of either source read speed, target write speed or a slow interconnect between them? Do you keep a full new backup every day, or are you just overwriting the target directory? /kc On Wed, Jul 01,

Re: cut-off time for rsync ?

2015-07-01 Thread Dirk van Deun
I used to rsync a /home with thousands of home directories every night, although only a hundred or so would be used on a typical day, and many of them have not been used for ages. This became too large a burden on the poor old destination server, so I switched to a script that uses find

Re: cut-off time for rsync ?

2015-07-01 Thread Dirk van Deun
If your goal is to reduce storage, and scanning inodes doesnt matter, use --link-dest for targets. However, that'll keep a backup for every time that you run it, by link-desting yesterday's copy. The goal was not to reduce storage, it was to reduce work. A full rsync takes more than the

Re: cut-off time for rsync ?

2015-06-30 Thread Fabian Cenedese
At 10:32 30.06.2015, Dirk van Deun wrote: Hi, I used to rsync a /home with thousands of home directories every night, although only a hundred or so would be used on a typical day, and many of them have not been used for ages. This became too large a burden on the poor old destination server, so

Re: cut-off time for rsync ?

2015-06-30 Thread Ken Chase
If your goal is to reduce storage, and scanning inodes doesnt matter, use --link-dest for targets. However, that'll keep a backup for every time that you run it, by link-desting yesterday's copy. Y end up with a backup tree dir per day, with files hardlinked against all other backup dirs. My (and