Remy Maucherat wrote: >>I have the first prototype ready for a java based implementation of Josh >>MacDonalds xdelta binary diff/delta algorithm. See >>http://sourceforge.net/projects/xdelta for more information on this >>algorithm. >> >>It is available for download here; >> > http://www.vertech.no/java-xdelta.tar.gz > >>This implementation uses the GDIFF file format for storing binary >>deltas. There is also a patcher that takes a source file and a patch >>file in this format and produces the original file. >> >>When this implementation is mature it might be usefull if someone wants >>to implement a versioned store implementation for slide with reduced >>physical storage requirements. >> >>I am seeking feedback on this. Warning; this is pre-alpha quality code. >> > > Obviously, that's a very useful library. I'll try to play with it :)
Some notes; as you'll see, the source 'file' needs to be seekable, so I use a RandomAccessFile in this implementation. This is a property of it being a copy-insert algorithm, and not a delete-insert like vanilla diff. It is possible to create an implementation that doesn't the source doesn't need to be seekable. But it *has* to be readable twice. There are implication eg. if one wants to store the last revision as the full one, and the previous as a delta; since they last revision might be available as an input stream from a http put. One way of doing it would be to write to the store first, and at the same time generate the hash fingerprints, then do the delta run on the source from the store. Reading the source from the store would probably imply that it is seekable to? When patching, the source needs to be seekable as well, due to the same reason as above. It is not very space optimized yet, as it uses a hashtable. I will use an array as the hashtable in the next release, as the algorithm really was designed for this. The GDIFFWriter could really be a special outputstream. The same goes for the patcher; it could be an inputstream that took the source input and the delta input as parameters. I have tried the algorithm on different input data, but it obviously needs a lot of testing. It would also be nice if someone could proof read my io code; I'm not shure I got all the signed / unsigned stuff right. -- -Torgeir -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
