https://bugzilla.wikimedia.org/show_bug.cgi?id=21200
Summary: dump format could declare which namespaces it covers
Product: Wikimedia
Version: unspecified
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: Normal
Component: General/Unknown
AssignedTo: [email protected]
ReportedBy: [email protected]
The XML dump files released by Wikimedia contain a <namespaces> section which
declares namespace names and numbers for the wiki it was dumped from.
But it does not tell you which of those namespaces are actually covered by the
dump files.
For instance *-*-pages-articles.xml dumps do not contain any "Talk", "* talk",
or "User" entries. Not even page title and redirect information.
This is fine but with wiki dumps now being produced in the same format also
outside Wikimedia with different subsets of namespaces covered, such as
http://devtionary.org/w/dump/xmlu/ the dump format is now an interchange format
of sorts. So it would be nice if such information which is currently metadata
external to the dump files could be made internal and self-contained. This
could be quite useful to tools designed to process dump files.
Perhaps a new section of the dump files named <dumpinfo> could be added to
complement the <siteinfo> section.
--
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l