On Sat, Nov 8, 2008 at 12:47 AM, Henning Garus <[EMAIL PROTECTED]> wrote: > Hi, > > I have been looking through the current delta implementation in > libalpm and have put some thought into changing makepkg/repo-add to > support delta creation. However, I'm running into some problems, > mostly due to md5sums and gzip. > > The current implementation works as follows. On a sync operation it is > checked, whether a valid delta path exists and if the summed filesize > of the deltas is smaller than the filesize of the whole download. When > this is the case the deltas are downloaded and applied to the old > file. After that the patched file is treated as if it was downloaded > normally, this includes a check of the md5sum. Gzip files have a > header, that has a timestamp, which will screw with this md5sum. When > a patch is applied to a gzipped file by xdelta, xdelta will unzip the > file, apply the patch and then rezip the file. The author of xdelta > was obviously aware of the problems with the timestamp, because he > decided to leave it empty. The same can be achieved by the -n option > of gzip. But there comes the next problem, xdelta uses zlib for > compression, gzip implements compression itself. And files created by > gzip can differ from files created by zlib. Bsdtar uses zlib as well, > but writes the timestamp and there is no option to prevent this (at > least none that I can see). > > There are four ways around this, that I can think of: > > 1. create the package, then create the delta, apply the delta to the > old version, remove the original new package and present the patched > package as output > > I think this sucks, this ties delta creation to makepkg (more about > that later) and has an incredibly huge and useless overhead (countless > unzips and rezips and applying the patch). > > 2. create the package, but don't compress it with bsdtar, use gzip -n > instead. This means we have to use gzip again, in libalpm, when we > apply the delta. > > Seems better than 1, but makes makepkg and libalpm rely on gzip. Not > sure if this is a good thing, especially for libalpm. > > 3. save the md5sums of the unzipped tars in the synchdb and change > libalpm to check those > > Seems reasonable, but I don't see a way to do this with libarchive, so > this would require using zlib directly and pacman would lose the > ability to handle to handle tar.bz2 > > 4. Skip checking the md5sum for deltas > > OK during the initial synch, as long as we trust xdelta to do its job > (the md5sums of both the old and the new file are in the delta file). > But the created package will have the wrong md5sum and can't be used > to reinstall, etc. which makes this look like a bad idea. > > > In a previous mail Xavier toyed with the idea to put delta creation > into repo-add, I have given this some thought, as it seems nice in > principle, but there are drawbacks. For Arch this would mean creating > deltas on Gerolde, which seems to be fairly strained already, > according to the dev list. Furthermore this introduces some new > variables to repo-add (at least repo location and an output location) > this would be manageable, but doesn't look very nice. > > Delta creation in makepkg seems somehow ok (its already in there after > all). But what I would really like is a separate tool for delta > creation, which would allow the separation of building packages and > creating deltas and setting up a separated delta server. This leaves > us with options 2 and 3 and I am not really sure, which way to go. > > > looking forward to your comments
I am very glad you looked into this, you seem to have a very good understanding of the situation, possibly better than me, so it would be great if you could fix and maintain this part. I would just go with option 2. When deltas are used, libalpm already relies on xdelta, so why not on gzip as well. _______________________________________________ pacman-dev mailing list [email protected] http://archlinux.org/mailman/listinfo/pacman-dev
