On Sat, Nov 8, 2008 at 12:47 AM, Henning Garus <[email protected]> wrote: > Hi, > > I have been looking through the current delta implementation in > libalpm and have put some thought into changing makepkg/repo-add to > support delta creation. However, I'm running into some problems, > mostly due to md5sums and gzip. > > The current implementation works as follows. On a sync operation it is > checked, whether a valid delta path exists and if the summed filesize > of the deltas is smaller than the filesize of the whole download. When > this is the case the deltas are downloaded and applied to the old > file. After that the patched file is treated as if it was downloaded > normally, this includes a check of the md5sum. Gzip files have a > header, that has a timestamp, which will screw with this md5sum. When > a patch is applied to a gzipped file by xdelta, xdelta will unzip the > file, apply the patch and then rezip the file. The author of xdelta > was obviously aware of the problems with the timestamp, because he > decided to leave it empty. The same can be achieved by the -n option > of gzip. But there comes the next problem, xdelta uses zlib for > compression, gzip implements compression itself. And files created by > gzip can differ from files created by zlib. Bsdtar uses zlib as well, > but writes the timestamp and there is no option to prevent this (at > least none that I can see). > > There are four ways around this, that I can think of: > > 1. create the package, then create the delta, apply the delta to the > old version, remove the original new package and present the patched > package as output > > I think this sucks, this ties delta creation to makepkg (more about > that later) and has an incredibly huge and useless overhead (countless > unzips and rezips and applying the patch). > > 2. create the package, but don't compress it with bsdtar, use gzip -n > instead. This means we have to use gzip again, in libalpm, when we > apply the delta. > > Seems better than 1, but makes makepkg and libalpm rely on gzip. Not > sure if this is a good thing, especially for libalpm. > > 3. save the md5sums of the unzipped tars in the synchdb and change > libalpm to check those > > Seems reasonable, but I don't see a way to do this with libarchive, so > this would require using zlib directly and pacman would lose the > ability to handle to handle tar.bz2 > > 4. Skip checking the md5sum for deltas > > OK during the initial synch, as long as we trust xdelta to do its job > (the md5sums of both the old and the new file are in the delta file). > But the created package will have the wrong md5sum and can't be used > to reinstall, etc. which makes this look like a bad idea. > > > In a previous mail Xavier toyed with the idea to put delta creation > into repo-add, I have given this some thought, as it seems nice in > principle, but there are drawbacks. For Arch this would mean creating > deltas on Gerolde, which seems to be fairly strained already, > according to the dev list. Furthermore this introduces some new > variables to repo-add (at least repo location and an output location) > this would be manageable, but doesn't look very nice. > > Delta creation in makepkg seems somehow ok (its already in there after > all). But what I would really like is a separate tool for delta > creation, which would allow the separation of building packages and > creating deltas and setting up a separated delta server. This leaves > us with options 2 and 3 and I am not really sure, which way to go. > > > looking forward to your comments
A very small bump on this :) 1) gzip -n usage But first, in the last discussion we had which started with the above mail, it seems we were more in favor of option 2) : > 2. create the package, but don't compress it with bsdtar, use gzip -n > instead. This means we have to use gzip again, in libalpm, when we > apply the delta. In fact, Nathan already made a patch for that. I think this patch looks fine : http://archive.netbsd.se/?ml=pacman-dev&a=2008-02&m=6427986 2) repo-add vs makepkg support Nathan even made one to add support to repo-add too, but this patch looked a bit more scary : http://archive.netbsd.se/?ml=pacman-dev&a=2008-02&m=6427987 It was more complex than I hoped. But the simpler way I was thinking about was to get delta support only in repo-add, instead of both makepkg and repo-add : http://archive.netbsd.se/?ml=pacman-dev&a=2008-02&m=6601225 Dan seemed to think it was better in repo-add, and Henning seems to think it is better in makepkg. We need more discussion on this and finally take a decision :) 2.1) About Nathan's patch to support both If we do want to have the functionality in both makepkg and repo-add, it would be cool to try to cleanup the code a bit, for example this : +# create_xdelta_file - will create a delta for the package filename given. +# +# params: +# $1 - the filename of the package +# $2 - the arch of the package +# $3 - the version and release of the package +# $4 - the directory where the package is located +# $5 - the extension of packages +# $6 - 0 if an existing delta file should not be overwritten +# $7 - the filename of the previous package (blank if not known) +# $8 - the version of the previous package (blank if not known) That's a lot of params :) 3) format of delta in the database However I don't think there is any repo-add / makepkg patch to support the new format. Henning also made a comment about the format : http://bugs.archlinux.org/task/12000#comment34162 "So basically the current delta implementation is working. Only the support in makepkg/repo-add is wrong. I am not exactly sure though, why libalpm expects the md5sums of the old and the new package. I am not sure if these are even used anywhere. I would feel save enough with xdelta checking those and then libalpm checking the md5sum of the final patched package." I guess Dan added these two md5sums for safety but yes, they might not be needed, I would also be fine with dropping them, even if they don't hurt. _______________________________________________ pacman-dev mailing list [email protected] http://www.archlinux.org/mailman/listinfo/pacman-dev
