ERSEK Laszlo wrote:
> ** 4. Thanassis Tsiodras' offline reader, available under
> 
> http://users.softlab.ece.ntua.gr/~ttsiod/buildWikipediaOffline.html
> 
> uses, according to section "Seeking in the dump file", bzip2recover to 
> split the bzip2 blocks out of the single bzip2 stream. The page states
> 
>       This process is fast (since it involves almost no CPU calculations
> 
> While this may be true relative to other dump-processing operations, 
> bzip2recover is, in fact, not much more than a huge single threaded 
> bit-shifter, which even makes two passes over the dump. (IIRC, the first 
> pass shifts over the whole dump to find bzip2 block delimiteres, then the 
> second pass shifts the blocks found previously into byte-aligned, separate 
> bzip2 streams.)

Hmm?  Admittedly, I don't know the bzip2 format very well, but as far as 
I understand it, there should be no bit-shifting involved: each block in 
the stream is a completely independent, self-contained sequence of bytes.

-- 
Ilmari Karonen

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to