https://bugzilla.wikimedia.org/show_bug.cgi?id=46912
Web browser: ---
Bug ID: 46912
Summary: Provide one each small multi-part "pages-articles" and
"meta-history" dump for testing purposes.
Product: Datasets
Version: unspecified
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: Unprioritized
Component: General/Unknown
Assignee: [email protected]
Reporter: [email protected]
Classification: Unclassified
Mobile Platform: ---
I'm enhancing my dump indexing and extraction tools to work with multi-file
dumps as currently used only for English Wikipedia.
One problem with developing these tools is the enormous size of the files. They
take a lot of time and bandwidth to download and consume a lot of hard drive
space. It also takes a long time for a tool to run over an entire dump of this
size, which is a problem during testing and development.
It would be great if we could have one of each type of multi-file XML dump
provided specifically for testing purposes.
They could be dumps of one of our actual smallest wikis or they could be dumps
of a test wiki set up specifically for this purpose. Contents being in Latin
script would be an advantage.
- One "pages-articles" multi-part dump, which has exactly 27 parts numbered
1-27.
- One "meta-history" multi-part dump, which has many more parts with the
numbers 1-27 occurring multiple times.
--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l