https://bugzilla.wikimedia.org/show_bug.cgi?id=19542

           Summary: Dump page titles for other namespaces
           Product: Wikimedia
           Version: unspecified
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: Normal
         Component: Downloads
        AssignedTo: [email protected]
        ReportedBy: [email protected]


Currently the only page titles available separately are namespace 0:
all-titles-in-ns0.gz

Apart from this most other titles are available in pages-articles.xml.bz2

Except for User pages and Talk pages, which are available in
pages-meta-current.xml.bz2

The articles and meta-current dumps are typically a couple of orders of
magnitude larger than the all-titles-in-ns0 dump.

The only ways to get complete lists of page titles are to download and process
these two enormous dump files or making excessive use of the API.

* We could dump a page title list to accompany each of pages-articles.xml.bz2
and pages-meta-current.xml.bz2
* We could dump a page title list for all namespaces.
* We could dump a page title list for all pages not already covered by
all-titles-in-ns0.gz
* We could dump a page title list for each namespace.

For my current purpose I already need to process pages-articles.xml.bz2 so I
only lack page titles for User and Talk pages so a dump of the titles for those
namespaces would be enough for me, but might not be the best for other
potential users of the data.


-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to