https://bugzilla.wikimedia.org/show_bug.cgi?id=26499
--- Comment #12 from Ariel T. Glenn <[email protected]> 2011-08-29 18:07:24 UTC --- (In response to comment 11) No they aren't but I have a C library that could be used to build such an index without a ton of work, for bzip2 files; specifically, there is a utility to find the offset to a block containing a specific pageID. Since 7z and gzip aren't block-oriented it's not possible to generate an index for those files. However, this feature is not as useful as you might think. For dump files that contain all revisions, it can take quite a while to locate a given pageID. That's because there are a few pages which, if the guesser happens to land in the middle of them, are ginormous (up to 163 GB) and take up to an hour to read through. If one prebuilt an index that mapped revision IDs to page IDs and kept this in memory, things could be speeded up a fair amount; alternatively one could work just with the current revisions. (In response to comment 9) Moving to xz will mean a rewrite of my bz2 library and utils and all the bits that rely on them, so that's not likely to happen until Dumps 2.0. (In response to comment 8) The easiest way to provide metadata of this nature is, like the md5 sums, to provide it in a separate file. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
