Re: Estimating backup usage with dir-merge filter
On Thu, Oct 6, 2011 at 6:49 PM, Henri Shustak henri.shus...@gmail.com wrote: It sounds like you missed the point of Kevin's message (in the other fork of this thread). The point wasn't to use `du`, it was that you can run your stats against the backed-up files, not the source. Then you're only running stats against the results of running the backup using the filters, so you don't need to filter them again. I got that but neglected to respond to the whole group. My mistake. The backups are being performed using BackupPC to a central server where compression and de-duplication is done. While it's true that the actual storage on the backup server being consumed by each user is less because of these, I don't have any problem hiding this from them and instead telling them what their uncompressed and duplicated usage is instead. It has more of an effect that way if you know what I mean. If that doesn't make sense or isn't possible (backups are on some remote server), then just use your rsync command with '--list-only', and post-process that list. I've been tinkering with using --verbose and --dry-run then parsing the total size our of the last line of the output and I think I'm close. Curiously, when I don't include the --filter option as a baseline, I'm not getting the same results as du. $ du -sb . | awk '{print $1}' 508625653 $ rsync --dry-run --verbose -a . /tmp/does_not_exist | tail -1 | awk '{print $4}' 506037893 The difference is minimal and probably negligible for this purpose but I'm still curious where it's coming from. Maybe there are some sparse files in there somewhere. Do you have the same discrepancy if you use the --stats option? Yes. Using --stats, the last line of the output is the same as is the earlier Total file size: line in the additional output. Paul -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Estimating backup usage with dir-merge filter
I appreciate the suggestions so far but I know how to measure usage with 'du' et al. The hitch here is that I want to exclude files the --filter='dir-merge .rsync-filter' excludes. Hense the thought to use rsync itself. On Oct 6, 2011 11:02 AM, K S Braunsdorf k...@sac.fedex.com wrote: that processes any filter files into --exclude parameters for du but recently, I've been wondering if there's an easier way that would use If your backups are all on a single partition you might try quot(8) (quot -- display disk space occupied by each user). I wrote a very simple perl script to munge quot ouptut to create a diskhogs report about 20 years ago, and I still use it today. I suggest you take the output of quot -kvf $BACKUP_DEVICE and filter it to fit your needs. If you can't find a quot for your OS I might have a C program that works as a replacement. --ksb at_host sac.fedex.com -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Estimating backup usage with dir-merge filter
On Thu, 6 Oct 2011, Paul Dugas wrote: I appreciate the suggestions so far but I know how to measure usage with 'du' et al. The hitch here is that I want to exclude files the --filter='dir-merge .rsync-filter' excludes. Hense the thought to use rsync itself. It sounds like you missed the point of Kevin's message (in the other fork of this thread). The point wasn't to use `du`, it was that you can run your stats against the backed-up files, not the source. Then you're only running stats against the results of running the backup using the filters, so you don't need to filter them again. If that doesn't make sense or isn't possible (backups are on some remote server), then just use your rsync command with '--list-only', and post-process that list. E.g., if your command is: rsync -a --filter='dir-merge .rsync-filter' /source /dest It becomes, with a post-processing command that just counts bytes from files (not dirs/sockets/etc.): (all one command line -- munged for emailing) rsync --list-only -a --filter='dir-merge .rsync-filter' /source /dest | awk '/^-/ { total += $2 } END { print total }' Post-processing is made simpler by the fact that rsync escapes special characters already. (So, you don't have to worry about null bytes or newlines or anything in the filenames.) -- Best, Ben -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Estimating backup usage with dir-merge filter
On Thu, Oct 6, 2011 at 1:01 PM, Benjamin R. Haskell rs...@benizi.comwrote: use your rsync command with '--list-only', and post-process that list. Even easier, just make a note of the verbose output from the copy (get better stats via --stats with or w/o --verbose). Or, if you need a special run, --dry-run (-n) will tell you the file-size totals w/o transferring anything. ..wayne.. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Estimating backup usage with dir-merge filter
On Thu, 6 Oct 2011, Wayne Davison wrote: On Thu, Oct 6, 2011 at 1:01 PM, Benjamin R. Haskell wrote: use your rsync command with '--list-only', and post-process that list. Even easier, just make a note of the verbose output from the copy (get better stats via --stats with or w/o --verbose). Or, if you need a special run, --dry-run (-n) will tell you the file-size totals w/o transferring anything. Depends on what stats are needed. If you just need total bytes, yeah, that's easier. My example didn't do it, but it sounded like Paul wanted some kind of per-user statistics. Important bits, if you need more granularity: First column is an `ls -l` style mode display (first character = 'd' for dirs, '-' for normal files, etc.) Second column is the size in bytes. Third is date. Fourth is time. Fifth-through-rest is the path relative to the transfer root. (Spaces aren't escaped, but other special chars are listed as \#NNN where N's are octal digits), . -- Best, Ben -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Estimating backup usage with dir-merge filter
On Thu, Oct 6, 2011 at 4:01 PM, Benjamin R. Haskell rs...@benizi.com wrote: It sounds like you missed the point of Kevin's message (in the other fork of this thread). The point wasn't to use `du`, it was that you can run your stats against the backed-up files, not the source. Then you're only running stats against the results of running the backup using the filters, so you don't need to filter them again. I got that but neglected to respond to the whole group. My mistake. The backups are being performed using BackupPC to a central server where compression and de-duplication is done. While it's true that the actual storage on the backup server being consumed by each user is less because of these, I don't have any problem hiding this from them and instead telling them what their uncompressed and duplicated usage is instead. It has more of an effect that way if you know what I mean. If that doesn't make sense or isn't possible (backups are on some remote server), then just use your rsync command with '--list-only', and post-process that list. I've been tinkering with using --verbose and --dry-run then parsing the total size our of the last line of the output and I think I'm close. Curiously, when I don't include the --filter option as a baseline, I'm not getting the same results as du. $ du -sb . | awk '{print $1}' 508625653 $ rsync --dry-run --verbose -a . /tmp/does_not_exist | tail -1 | awk '{print $4}' 506037893 The difference is minimal and probably negligible for this purpose but I'm still curious where it's coming from. Maybe there are some sparse files in there somewhere. Paul -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Estimating backup usage with dir-merge filter
It sounds like you missed the point of Kevin's message (in the other fork of this thread). The point wasn't to use `du`, it was that you can run your stats against the backed-up files, not the source. Then you're only running stats against the results of running the backup using the filters, so you don't need to filter them again. I got that but neglected to respond to the whole group. My mistake. The backups are being performed using BackupPC to a central server where compression and de-duplication is done. While it's true that the actual storage on the backup server being consumed by each user is less because of these, I don't have any problem hiding this from them and instead telling them what their uncompressed and duplicated usage is instead. It has more of an effect that way if you know what I mean. If that doesn't make sense or isn't possible (backups are on some remote server), then just use your rsync command with '--list-only', and post-process that list. I've been tinkering with using --verbose and --dry-run then parsing the total size our of the last line of the output and I think I'm close. Curiously, when I don't include the --filter option as a baseline, I'm not getting the same results as du. $ du -sb . | awk '{print $1}' 508625653 $ rsync --dry-run --verbose -a . /tmp/does_not_exist | tail -1 | awk '{print $4}' 506037893 The difference is minimal and probably negligible for this purpose but I'm still curious where it's coming from. Maybe there are some sparse files in there somewhere. Do you have the same discrepancy if you use the --stats option? This email is protected by LBackup http://www.lbackup.org -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Estimating backup usage with dir-merge filter
I use --filter='dir-merge .backup-filter to allow my users to designate portions of their home directories that should be excluded from my rsync-based backup system. I'm looking for a way to periodically generate a report that shows the amount of backup space being used by each user. I've tinkered with writing my own script that processes any filter files into --exclude parameters for du but recently, I've been wondering if there's an easier way that would use rsync itself, the --filter argument, and --dry-run. Anyone ever run into something like this? P -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Estimating backup usage with dir-merge filter
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Why not do the du on the backup where the excluded files aren't there? On 10/05/11 12:57, Paul Dugas wrote: I use --filter='dir-merge .backup-filter to allow my users to designate portions of their home directories that should be excluded from my rsync-based backup system. I'm looking for a way to periodically generate a report that shows the amount of backup space being used by each user. I've tinkered with writing my own script that processes any filter files into --exclude parameters for du but recently, I've been wondering if there's an easier way that would use rsync itself, the --filter argument, and --dry-run. Anyone ever run into something like this? P - -- ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~ Kevin Korb Phone:(407) 252-6853 Systems Administrator Internet: FutureQuest, Inc. ke...@futurequest.net (work) Orlando, Floridak...@sanitarium.net (personal) Web page: http://www.sanitarium.net/ PGP public key available on web site. ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~ -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.17 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk6MjwcACgkQVKC1jlbQAQdXmwCg4svGXZBq0uUFfbRdkJW7gvWe LDcAnj1ZbtjppnU2wh84LL+ps7Q5iT78 =7t6m -END PGP SIGNATURE- -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html