Re: Estimating backup usage with dir-merge filter

2011-10-07 Thread Paul Dugas
On Thu, Oct 6, 2011 at 6:49 PM, Henri Shustak henri.shus...@gmail.com wrote:
 It sounds like you missed the point of Kevin's message (in the other fork 
 of this thread).  The point wasn't to use
 `du`, it was that you can run your stats against the backed-up files, not 
 the source.  Then you're only running stats
 against the results of running the backup using the filters, so you don't 
 need to filter them again.

 I got that but neglected to respond to the whole group.  My mistake.
 The backups are being performed using BackupPC to a central server
 where compression and de-duplication is done.  While it's true that
 the actual storage on the backup server being consumed by each user is
 less because of these, I don't have any problem hiding this from them
 and instead telling them what their uncompressed and duplicated usage
 is instead.  It has more of an effect that way if you know what I
 mean.

 If that doesn't make sense or isn't possible (backups are on some remote 
 server), then just use your rsync command
 with '--list-only', and post-process that list.

 I've been tinkering with using --verbose and --dry-run then parsing
 the total size our of the last line of the output and I think I'm
 close.  Curiously, when I don't include the --filter option as a
 baseline, I'm not getting the same results as du.

 $ du -sb . | awk '{print $1}'
 508625653

 $ rsync --dry-run --verbose -a . /tmp/does_not_exist | tail -1 | awk
 '{print $4}'
 506037893

 The difference is minimal and probably negligible for this purpose but
 I'm still curious where it's coming from.  Maybe there are some sparse
 files in there somewhere.

 Do you have the same discrepancy if you use the --stats option?

Yes.  Using --stats, the last line of the output is the same as is the
earlier Total file size: line in the additional output.

Paul
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Estimating backup usage with dir-merge filter

2011-10-06 Thread Paul Dugas
I appreciate the suggestions so far but I know how to measure usage with
'du' et al. The hitch here is that I want to exclude files the
--filter='dir-merge .rsync-filter' excludes. Hense the thought to use rsync
itself.
On Oct 6, 2011 11:02 AM, K S Braunsdorf k...@sac.fedex.com wrote:
that processes any filter files into --exclude parameters for du but
recently, I've been wondering if there's an easier way that would use

 If your backups are all on a single partition you might try quot(8)
 (quot -- display disk space occupied by each user). I wrote a
 very simple perl script to munge quot ouptut to create a diskhogs
 report about 20 years ago, and I still use it today. I suggest you
 take the output of
 quot -kvf $BACKUP_DEVICE

 and filter it to fit your needs. If you can't find a quot for your
 OS I might have a C program that works as a replacement.

 --ksb at_host sac.fedex.com
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Estimating backup usage with dir-merge filter

2011-10-06 Thread Benjamin R. Haskell

On Thu, 6 Oct 2011, Paul Dugas wrote:

I appreciate the suggestions so far but I know how to measure usage 
with 'du' et al. The hitch here is that I want to exclude files the 
--filter='dir-merge .rsync-filter' excludes. Hense the thought to use 
rsync itself.


It sounds like you missed the point of Kevin's message (in the other 
fork of this thread).  The point wasn't to use `du`, it was that you can 
run your stats against the backed-up files, not the source.  Then you're 
only running stats against the results of running the backup using the 
filters, so you don't need to filter them again.


If that doesn't make sense or isn't possible (backups are on some remote 
server), then just use your rsync command with '--list-only', and 
post-process that list.


E.g., if your command is:

rsync -a --filter='dir-merge .rsync-filter' /source /dest

It becomes, with a post-processing command that just counts bytes from 
files (not dirs/sockets/etc.): (all one command line -- munged for emailing)


rsync --list-only -a --filter='dir-merge .rsync-filter' /source /dest
| awk '/^-/ { total += $2 } END { print total }'

Post-processing is made simpler by the fact that rsync escapes special 
characters already.  (So, you don't have to worry about null bytes or 
newlines or anything in the filenames.)


--
Best,
Ben
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Estimating backup usage with dir-merge filter

2011-10-06 Thread Wayne Davison
On Thu, Oct 6, 2011 at 1:01 PM, Benjamin R. Haskell rs...@benizi.comwrote:

 use your rsync command with '--list-only', and post-process that list.


Even easier, just make a note of the verbose output from the copy (get
better stats via --stats with or w/o --verbose).  Or, if you need a special
run, --dry-run (-n) will tell you the file-size totals w/o transferring
anything.

..wayne..
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Estimating backup usage with dir-merge filter

2011-10-06 Thread Benjamin R. Haskell

On Thu, 6 Oct 2011, Wayne Davison wrote:


On Thu, Oct 6, 2011 at 1:01 PM, Benjamin R. Haskell wrote:


use your rsync command with '--list-only', and post-process that list.


Even easier, just make a note of the verbose output from the copy (get 
better stats via --stats with or w/o --verbose).  Or, if you need a 
special run, --dry-run (-n) will tell you the file-size totals w/o 
transferring anything.


Depends on what stats are needed.  If you just need total bytes, yeah, 
that's easier.  My example didn't do it, but it sounded like Paul wanted 
some kind of per-user statistics.  Important bits, if you need more 
granularity:


First column is an `ls -l` style mode display (first character = 'd' for 
dirs, '-' for normal files, etc.)


Second column is the size in bytes.

Third is date.

Fourth is time.

Fifth-through-rest is the path relative to the transfer root.  (Spaces 
aren't escaped, but other special chars are listed as \#NNN where N's 
are octal digits), .


--
Best,
Ben
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Estimating backup usage with dir-merge filter

2011-10-06 Thread Paul Dugas
On Thu, Oct 6, 2011 at 4:01 PM, Benjamin R. Haskell rs...@benizi.com wrote:
 It sounds like you missed the point of Kevin's message (in the other fork of 
 this thread).  The point wasn't to use
 `du`, it was that you can run your stats against the backed-up files, not the 
 source.  Then you're only running stats
 against the results of running the backup using the filters, so you don't 
 need to filter them again.

I got that but neglected to respond to the whole group.  My mistake.
The backups are being performed using BackupPC to a central server
where compression and de-duplication is done.  While it's true that
the actual storage on the backup server being consumed by each user is
less because of these, I don't have any problem hiding this from them
and instead telling them what their uncompressed and duplicated usage
is instead.  It has more of an effect that way if you know what I
mean.

 If that doesn't make sense or isn't possible (backups are on some remote 
 server), then just use your rsync command
 with '--list-only', and post-process that list.

I've been tinkering with using --verbose and --dry-run then parsing
the total size our of the last line of the output and I think I'm
close.  Curiously, when I don't include the --filter option as a
baseline, I'm not getting the same results as du.

$ du -sb . | awk '{print $1}'
508625653

$ rsync --dry-run --verbose -a . /tmp/does_not_exist | tail -1 | awk
'{print $4}'
506037893

The difference is minimal and probably negligible for this purpose but
I'm still curious where it's coming from.  Maybe there are some sparse
files in there somewhere.

Paul
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Estimating backup usage with dir-merge filter

2011-10-06 Thread Henri Shustak
 It sounds like you missed the point of Kevin's message (in the other fork of 
 this thread).  The point wasn't to use
 `du`, it was that you can run your stats against the backed-up files, not 
 the source.  Then you're only running stats
 against the results of running the backup using the filters, so you don't 
 need to filter them again.
 
 I got that but neglected to respond to the whole group.  My mistake.
 The backups are being performed using BackupPC to a central server
 where compression and de-duplication is done.  While it's true that
 the actual storage on the backup server being consumed by each user is
 less because of these, I don't have any problem hiding this from them
 and instead telling them what their uncompressed and duplicated usage
 is instead.  It has more of an effect that way if you know what I
 mean.
 
 If that doesn't make sense or isn't possible (backups are on some remote 
 server), then just use your rsync command
 with '--list-only', and post-process that list.
 
 I've been tinkering with using --verbose and --dry-run then parsing
 the total size our of the last line of the output and I think I'm
 close.  Curiously, when I don't include the --filter option as a
 baseline, I'm not getting the same results as du.
 
 $ du -sb . | awk '{print $1}'
 508625653
 
 $ rsync --dry-run --verbose -a . /tmp/does_not_exist | tail -1 | awk
 '{print $4}'
 506037893
 
 The difference is minimal and probably negligible for this purpose but
 I'm still curious where it's coming from.  Maybe there are some sparse
 files in there somewhere.

Do you have the same discrepancy if you use the --stats option?



 This email is protected by LBackup
 http://www.lbackup.org


-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Estimating backup usage with dir-merge filter

2011-10-05 Thread Paul Dugas
I use --filter='dir-merge .backup-filter to allow my users to
designate portions of their home directories that should be excluded
from my rsync-based backup system.  I'm looking for a way to
periodically generate a report that shows the amount of backup space
being used by each user.  I've tinkered with writing my own script
that processes any filter files into --exclude parameters for du but
recently, I've been wondering if there's an easier way that would use
rsync itself, the --filter argument, and --dry-run.  Anyone ever run
into something like this?

P
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Estimating backup usage with dir-merge filter

2011-10-05 Thread Kevin Korb
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Why not do the du on the backup where the excluded files aren't there?

On 10/05/11 12:57, Paul Dugas wrote:
 I use --filter='dir-merge .backup-filter to allow my users to
 designate portions of their home directories that should be excluded
 from my rsync-based backup system.  I'm looking for a way to
 periodically generate a report that shows the amount of backup space
 being used by each user.  I've tinkered with writing my own script
 that processes any filter files into --exclude parameters for du but
 recently, I've been wondering if there's an easier way that would use
 rsync itself, the --filter argument, and --dry-run.  Anyone ever run
 into something like this?
 
 P

- -- 
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
Kevin Korb  Phone:(407) 252-6853
Systems Administrator   Internet:
FutureQuest, Inc.   ke...@futurequest.net  (work)
Orlando, Floridak...@sanitarium.net (personal)
Web page:   http://www.sanitarium.net/
PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk6MjwcACgkQVKC1jlbQAQdXmwCg4svGXZBq0uUFfbRdkJW7gvWe
LDcAnj1ZbtjppnU2wh84LL+ps7Q5iT78
=7t6m
-END PGP SIGNATURE-
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html