https://bugzilla.wikimedia.org/show_bug.cgi?id=57739
Web browser: ---
Bug ID: 57739
Summary: all-titles file doesn't include namespace prefix
Product: Datasets
Version: unspecified
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: Unprioritized
Component: General/Unknown
Assignee: [email protected]
Reporter: [email protected]
CC: [email protected]
Classification: Unclassified
Mobile Platform: ---
the file *-all-titles.gz seems not include namespace prefix
for example
http://dumps.wikimedia.org/commonswiki/20131121/commonswiki-20131121-all-titles.gz
It seems that in the curent process (currently:
http://git.wikimedia.org/blob/operations%2Fdumps.git/11e9b23b4bc76bf3d89e1fb32348c7a11079bd55/xmldumps-backup%2Fworker.py#L4043
)
it's a simple query
query="select page_title from page;"
and the namespace is not in page_title
it makes this file nearly useless as one is unable to make the difference
between a title in the main namespace and a title in an other namespace or
between two different namespace
--
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l