> Is there any current file system or software for the OSs in question

> that maintains a list of what blocks in a sparse file were modified and

> when?  If not, there's no real way to do what you want, as some program

> is going to have to walk the entire file to find any changes that have

> occurred since the last backup.

 

There are two answers to this question.

 

#1  Yes.  I forget what the underlying function call or API or whatever is
called, but there is *some* method available to monitor the filesystem
activity, and notice which blocks change in some file or files.  I presume
this is what crashplan is using, because they claim they're able to notice
in real-time when blocks are changing, and then back up using byte
differential.  Again, crashplan seems to do a good job of creating the
incremental backups of sparse files, but they have no option to restore them
sparsely.  I am conversing with their support team, hoping they'll somehow
rectify this, but who knows.

 

#2  Even with something less intelligent, an acceptable or incremental
improvement could be made over the backup solutions that I'm currently aware
of.  Today, the only backup option I know of is to do a full image, every
time.  For example, via tar or rsync, they can both efficiently create full
images of sparse files, and then restore sparsely.  But they have no way to
do incrementals on subsequent runs.

 

Suppose there's a tool, which works like this:

.         On the first run, the whole file is sent.  Meanwhile, a checksum
is calculated for lots of little chunks, and stored somewhere.

.         On a subsequent run, the whole file must be read locally and the
chunks all get checksummed again, but all the unchanged chunks don't need to
be sent.

 

The time required to read and checksum the file is much faster than sending
the whole file to the destination every time.  Although this leaves obvious
room for improvement, it is a huge improvement over what I'm currently able
to find.

 

I benchmarked this, because I was curious.  On my mac, I have a 40G virtual
machine, which is 18G used.  It took about 30 minutes to backup the whole
image across the LAN.  It took about 6 minutes to md5sum it.  If I were able
to create an incremental in 6-7 minutes, I would do it regularly.  Once
every couple of days.  But when it takes a half an hour ... I'll only do it
once every 2-4 weeks, at most.

 

Actually, this makes perfect sense.  SATA disk reads 500Mbit/s.  This is 5x
higher than the 100Mbit LAN.  So the performance ends up being 5x higher,
and my file reads in 6min instead of 30.

_______________________________________________
Tech mailing list
[email protected]
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to