https://bugzilla.wikimedia.org/show_bug.cgi?id=57739

       Web browser: ---
            Bug ID: 57739
           Summary: all-titles file doesn't include namespace prefix
           Product: Datasets
           Version: unspecified
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: Unprioritized
         Component: General/Unknown
          Assignee: [email protected]
          Reporter: [email protected]
                CC: [email protected]
    Classification: Unclassified
   Mobile Platform: ---

the file *-all-titles.gz seems not include namespace prefix 
for example 

http://dumps.wikimedia.org/commonswiki/20131121/commonswiki-20131121-all-titles.gz

It seems that in the curent process (currently:
http://git.wikimedia.org/blob/operations%2Fdumps.git/11e9b23b4bc76bf3d89e1fb32348c7a11079bd55/xmldumps-backup%2Fworker.py#L4043
)
it's a simple query
query="select page_title from page;"

and the namespace is not in page_title

it makes this file nearly useless as one is unable to make the difference
between a title in the main namespace and a title in an other namespace or
between two different namespace

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to