ERSEK Laszlo wrote: > ** 4. Thanassis Tsiodras' offline reader, available under > > http://users.softlab.ece.ntua.gr/~ttsiod/buildWikipediaOffline.html > > uses, according to section "Seeking in the dump file", bzip2recover to > split the bzip2 blocks out of the single bzip2 stream. The page states > > This process is fast (since it involves almost no CPU calculations > > While this may be true relative to other dump-processing operations, > bzip2recover is, in fact, not much more than a huge single threaded > bit-shifter, which even makes two passes over the dump. (IIRC, the first > pass shifts over the whole dump to find bzip2 block delimiteres, then the > second pass shifts the blocks found previously into byte-aligned, separate > bzip2 streams.)
Hmm? Admittedly, I don't know the bzip2 format very well, but as far as I understand it, there should be no bit-shifting involved: each block in the stream is a completely independent, self-contained sequence of bytes. -- Ilmari Karonen _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
