https://bugzilla.wikimedia.org/show_bug.cgi?id=19542
Summary: Dump page titles for other namespaces
Product: Wikimedia
Version: unspecified
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: Normal
Component: Downloads
AssignedTo: [email protected]
ReportedBy: [email protected]
Currently the only page titles available separately are namespace 0:
all-titles-in-ns0.gz
Apart from this most other titles are available in pages-articles.xml.bz2
Except for User pages and Talk pages, which are available in
pages-meta-current.xml.bz2
The articles and meta-current dumps are typically a couple of orders of
magnitude larger than the all-titles-in-ns0 dump.
The only ways to get complete lists of page titles are to download and process
these two enormous dump files or making excessive use of the API.
* We could dump a page title list to accompany each of pages-articles.xml.bz2
and pages-meta-current.xml.bz2
* We could dump a page title list for all namespaces.
* We could dump a page title list for all pages not already covered by
all-titles-in-ns0.gz
* We could dump a page title list for each namespace.
For my current purpose I already need to process pages-articles.xml.bz2 so I
only lack page titles for User and Talk pages so a dump of the titles for those
namespaces would be enough for me, but might not be the best for other
potential users of the data.
--
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l