> Is there any current file system or software for the OSs in question > that maintains a list of what blocks in a sparse file were modified and
> when? If not, there's no real way to do what you want, as some program > is going to have to walk the entire file to find any changes that have > occurred since the last backup. There are two answers to this question. #1 Yes. I forget what the underlying function call or API or whatever is called, but there is *some* method available to monitor the filesystem activity, and notice which blocks change in some file or files. I presume this is what crashplan is using, because they claim they're able to notice in real-time when blocks are changing, and then back up using byte differential. Again, crashplan seems to do a good job of creating the incremental backups of sparse files, but they have no option to restore them sparsely. I am conversing with their support team, hoping they'll somehow rectify this, but who knows. #2 Even with something less intelligent, an acceptable or incremental improvement could be made over the backup solutions that I'm currently aware of. Today, the only backup option I know of is to do a full image, every time. For example, via tar or rsync, they can both efficiently create full images of sparse files, and then restore sparsely. But they have no way to do incrementals on subsequent runs. Suppose there's a tool, which works like this: . On the first run, the whole file is sent. Meanwhile, a checksum is calculated for lots of little chunks, and stored somewhere. . On a subsequent run, the whole file must be read locally and the chunks all get checksummed again, but all the unchanged chunks don't need to be sent. The time required to read and checksum the file is much faster than sending the whole file to the destination every time. Although this leaves obvious room for improvement, it is a huge improvement over what I'm currently able to find. I benchmarked this, because I was curious. On my mac, I have a 40G virtual machine, which is 18G used. It took about 30 minutes to backup the whole image across the LAN. It took about 6 minutes to md5sum it. If I were able to create an incremental in 6-7 minutes, I would do it regularly. Once every couple of days. But when it takes a half an hour ... I'll only do it once every 2-4 weeks, at most. Actually, this makes perfect sense. SATA disk reads 500Mbit/s. This is 5x higher than the 100Mbit LAN. So the performance ends up being 5x higher, and my file reads in 6min instead of 30.
_______________________________________________ Tech mailing list [email protected] http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
