Thank you as always for this work. It is enormously helpful, for casual analysis as well as deep research. SJ On Feb 6, 2015 12:37 AM, "Federico Leva (Nemo)" <nemow...@gmail.com> wrote:
> I just published https://archive.org/details/wikia_dump_20141219 : > > ---- > > Snapshot of all the known Wikia dumps. Where a Wikia public dump was > missing, we produced one ourselves. 9 broken wikis, as well as lyricswikia > and some wikis for which dumpgenerator.py failed, are still missing; some > Wikia XML files are incorrectly terminated and probably incomplete. > > In detail, this item contains dumps for 268 902 wikis in total, of which > 21 636 full dumps produced by Wikia, 247 266 full XML dumps produced by us > and 5610 image dumps produced by Wikia. Up to 60 752 wikis are missing. > Nonetheless, this is the most complete Wikia dump ever produced. > > ---- > > We appreciate help to: > * verify the quality of the data (for Wikia dumps I only checked valid > gzipping; for WikiTeam dumps only XML well-formedness > https://github.com/WikiTeam/wikiteam/issues/214 ); > * figure out what's going on for those 60k missing wikis > https://github.com/WikiTeam/wikiteam/commit/a1921f0919c7b44cfef967f5d07ea4 > 953b0a736d ; > * improve dumpgenerator.py management of huge XML files > https://github.com/WikiTeam/wikiteam/issues/8 ; > * fix anything else! https://github.com/WikiTeam/wikiteam/issues > > For all updates on Wikia dumps, please watchlist/subscribe to the feed of: > http://archiveteam.org/index.php?title=Wikia (notable update: future > Wikia dumps will be 7z). > > Nemo > > _______________________________________________ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l