RE: Incremental backups and batch mode.

2002-03-30 Thread Diego Liziero

>You're using the wrong tool -- you want a binary diff program instead.
>Run that on your files, then rsync/tar/cp/whatever the diffs.

Not exactly, I need the rsync algorithm to check the new version
of the file against the checksums of that file calculated
when the previous backup was made, and to obtain the delta
needed to upgrade the previous backup to the current state
(in a similar way the rsync batch-mode delta file is saved)

>I don't use it myself, but a Google search for "binary diff" lands
>XDelta .

I don't need to have different version of
the same file in the same filsystem.
I need just to backup a filesystem in a incremental way such as
much tape space is saved.

I think the most similar package to what I would like to write
is rdiff (a sample program of librsync inside rproxy project).

Again, what I've asked at the beginning was some advice if
someone thinks it is useful to modify rsync
batch mode to work as rdiff, otherwise I start writing something
new.

BTW
 rdiff is an example program that uses librsync. It permits to
 do 3 different things:

   1) fom a file calculate its checksums.

   rdiff [options] signature old-file signature-file

   2) from a modified file and previous checksums calculate the delta.

   rdiff [options] delta signature-file new-file delta-file

   3) from a unmodified file and a delta file obtain the new file.

   rdiff [options] patch basis-file delta-file new-file
   
If I decide to start from this program, what I need is
to make it work for a filesystem and not just a file.

I don't know which of the two options is easier (modifying rsync /
modifying or rewriting rdiff), is there in this list some
developer that can suggest me the right path to choose?
Thanks.

Diego Liziero.


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



RE: Incremental backups and batch mode.

2002-03-30 Thread Diego Liziero

>Ah... now I see.  Unfortunately, this one's over my head.  Can anyone else
>help here?  Can rsync deal explicitly with parts of files?

The rsync program can deal with delta files, but just in the batch mode,
unfortunately it is not exactly what I need.

The rsync algorithm instead is exaclty what I need for that.

Without the batch mode in rsync program I think I would have started writing
a new software from the beginning.

What I asked in my first posting was if it is worth to go on modifying
rsync so that batch mode become useful also for the kind of things I need,
or if instead I should start writing something new.

Any developer here?

Diego Liziero


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Incremental backups and batch mode.

2002-03-30 Thread Adrian Ho

On Thu, Mar 28, 2002 at 09:06:59PM +, Diego Liziero wrote:
> So at every backup the whole 2Gbyte file is saved.

That's exactly what rsync's supposed to do, AIUI.  I would be /very/
upset if it didn't make perfect copies.  8-)

> So I would like to use the rsync algorithm to calculate the differences
> (delta files) for the levels n>0 in the same order dump and tar work
> but saving much more tape space.

You're using the wrong tool -- you want a binary diff program instead.
Run that on your files, then rsync/tar/cp/whatever the diffs.

I don't use it myself, but a Google search for "binary diff" lands
XDelta .

- Adrian

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



RE: Incremental backups and batch mode.

2002-03-28 Thread Mike Rubel


Diego wrote:

> Right, wonderful, but let's consider a big database file, let's say
> a 2Gbyte file, that is slightly changed every day of about a 10%
...
> So at every backup the whole 2Gbyte file is saved.
...
> So I would like to use the rsync algorithm to calculate the differences
> (delta files) for the levels n>0 in the same order dump and tar work
> but saving much more tape space.

Ah... now I see.  Unfortunately, this one's over my head.  Can anyone else
help here?  Can rsync deal explicitly with parts of files?

Mike


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



RE: Incremental backups and batch mode.

2002-03-28 Thread Diego Liziero

Thanks, now I know how rsync backup option works.

But I haven't been so clear about what I would like to do.

>> I would like to have a first snapshot (level 0) that is a complete copy,
>> and then other incremental backups that are just delta files
>> (just the differences from the level 0 snapshot).
>
>The "normal" utilities for this job would be dump and tar, especially if
>you're dumping to tape.  You can also use rsync, but it's somewhat
>indirect if you're dumping to tape!  :)

Right, wonderful, but let's consider a big database file, let's say
a 2Gbyte file, that is slightly changed every day of about a 10%

With those tools, the nex level backup consists in checking
the modification time of the files, if the files are changed since last
backup, they are saved again.

So at every backup the whole 2Gbyte file is saved.

If the backup is needed twice a day, in a week 28Gbytes are used, even if
the changed parts are about one tenth.

So I would like to use the rsync algorithm to calculate the differences
(delta files) for the levels n>0 in the same order dump and tar work
but saving much more tape space.

I hope to be a bit clearer now...

Diego Liziero


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



RE: Incremental backups and batch mode.

2002-03-28 Thread Mike Rubel


> Something similar:
> I would like to have a first snapshot (level 0) that is a complete copy,
> and then other incremental backups that are just delta files
> (just the differences from the level 0 snapshot).

The "normal" utilities for this job would be dump and tar, especially if
you're dumping to tape.  You can also use rsync, but it's somewhat
indirect if you're dumping to tape!  :)

> That should be done saving the checksums of the level 0 backup locally
> and then checking the current files against those checksums to calculate the
> delta files to be saved as leve 1 backup, and so on.

Okay, one thing that takes a little getting used to here is that if you
use rsync, the backup order is reversed.  Let's see if I can explain it
here.

Using dump or tar, the 0 backup is large; it contains the whole filesystem
at the time it was made.  Then the 1 backup is smaller; it contains only
the changes made between t_0 and t_1.  The 2 backup would also be small,
consisting only of changes between t_1 and t_2.  And so on.

Using rsync, the process is reversed.  The *most recent* backup is the big
one, and *earlier* backups contain only the files that changed.  So the 0
backup is the most recent one, the 1 backup contains only those files that
changed between t_1 and the most recent backup; the 2 backup contains only
those files that changed between t_2 and t_1, and so on back in time.
It's counterintuitive, but it's vastly more efficient for remote backups
since you only need to do a full dump once, then never again.

Now, how would you implement it?

For simplicity's sake, I'm going to say that you're backing up /home into
the directory /home-backup.  Extending that to backup on a remote machine
is a separate (albeit easy) issue, so I won't cover that here.

Under /home-backup, you make folders like so (you'll probably find your
own names for these folders):

/home-backup/current/
/home-backup/current-1day/
/home-backup/current-2day/

The idea is that current/ would contain the current image (most of the
files), current-1day/ would contain only the files that changed since
yesterday, and current-2day/ would have anything that changed between two
days ago and yesterday.

You can have as many of these as you want, and they don't have to be
evenly spaced; this is just for example.

Now, to make it work, run something like this once a day:

# delete the oldest incremental backup
rm -rf /home-backup/current-2day

# shift the intermediate incremental backups back by one
mv /home-backup/current-1day /home-backup/current-2day

# rsync into /home-backup/current, copying any changed files into the #
folder current-1day first

rsync -vab --delete --backup-dir=/home-backup/current-1day /home/   \
/home-backup/current

You can also use exclude lists and all that other stuff.

Is this clear?
Mike


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Incremental backups and batch mode.

2002-03-28 Thread Diego Liziero

I'm trying to use the rsync algorithm for incremental backups.

After a quick look at rsync I saw the batch mode operations,
and I thought that maybe I can modify them for incremental backups.

What is needed is to add an option to save the checksums of all
the files of the level 0 backup and a second option to use the level n
checksum to calculate the delta batch files for the level n+1 backup.

After a deeper look into the batch mode I saw that it is
too specific to that kind of application for which it
has been written.

The contents of the *.rsync_csums files seem to be always
1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 5 ...
instead of the real checksums.

I think that the batch mode would be more useful (at least for the
incremental backup) if it worked like rdiff:
an option to save checksum, one to calculate and save the deltas
using the previous saved checksums and the current files
and one third option to read the delta files and update
the old files.

As regards the incremental backup program I would like to write,
I don't know if it is worth modifying rsync or instead starting
to write something new based on librsync.

Any suggestion?

Thanks for any help.

Diego Liziero.

(Please, cc you answer to me as I'm not subscribed to this mailing list)


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html