https://bugzilla.wikimedia.org/show_bug.cgi?id=46912

       Web browser: ---
            Bug ID: 46912
           Summary: Provide one each small multi-part "pages-articles" and
                    "meta-history" dump for testing purposes.
           Product: Datasets
           Version: unspecified
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: Unprioritized
         Component: General/Unknown
          Assignee: [email protected]
          Reporter: [email protected]
    Classification: Unclassified
   Mobile Platform: ---

I'm enhancing my dump indexing and extraction tools to work with multi-file
dumps as currently used only for English Wikipedia.

One problem with developing these tools is the enormous size of the files. They
take a lot of time and bandwidth to download and consume a lot of hard drive
space. It also takes a long time for a tool to run over an entire dump of this
size, which is a problem during testing and development.

It would be great if we could have one of each type of multi-file XML dump
provided specifically for testing purposes.

They could be dumps of one of our actual smallest wikis or they could be dumps
of a test wiki set up specifically for this purpose. Contents being in Latin
script would be an advantage.

- One "pages-articles" multi-part dump, which has exactly 27 parts numbered
1-27.
- One "meta-history" multi-part dump, which has many more parts with the
numbers 1-27 occurring multiple times.

-- 
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to