https://bugzilla.wikimedia.org/show_bug.cgi?id=62109

            Bug ID: 62109
           Summary: Add canonical namespaces and aliases to XML dumps
           Product: Datasets
           Version: unspecified
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: Unprioritized
         Component: General/Unknown
          Assignee: ar...@wikimedia.org
          Reporter: aaron.halfa...@gmail.com
                CC: gsv...@gmail.com
       Web browser: ---
   Mobile Platform: ---

The XML dump contains a siteinfo header with a <namespaces> tag that is very
useful for processing the text in the dumps.  It looks something like this:

<mediawiki ...snip... >
  <siteinfo>
    <sitename>Վիքիպեդիա</sitename>
   
<base>http://hy.wikipedia.org/wiki/%D4%B3%D5%AC%D5%AD%D5%A1%D5%BE%D5%B8%D6%80_%D5%A7%D5%BB</base>
    <generator>MediaWiki 1.23wmf15</generator>
    <case>first-letter</case>
    <namespaces>
      <namespace key="-2" case="first-letter">Մեդիա</namespace>
      <namespace key="-1" case="first-letter">Սպասարկող</namespace>
      <namespace key="0" case="first-letter" />
      <namespace key="1" case="first-letter">Քննարկում</namespace>
      <namespace key="2" case="first-letter">Մասնակից</namespace>

  ...snip...

    </namespaces>
  </siteinfo>

Regretfully, this header does not include canonical namespace names or
namespace aliases.  However, an API request for "meta=siteinfo" does include
these bits.  For example, the call for
http://hy.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=namespaces|namespacealiases
returns the following XML:

<api>
  <query>
    <namespaces>
      <ns id="-2" case="first-letter" canonical="Media"
xml:space="preserve">Մեդիա</ns>
      <ns id="-1" case="first-letter" canonical="Special"
xml:space="preserve">Սպասարկող</ns>
      <ns id="0" case="first-letter" content="" xml:space="preserve" />
      <ns id="1" case="first-letter" subpages="" canonical="Talk"
xml:space="preserve">Քննարկում</ns>
      <ns id="2" case="first-letter" subpages="" canonical="User"
xml:space="preserve">Մասնակից</ns>

  ...snip...

    </namespaces>
    <namespacealiases>
      <ns id="6" xml:space="preserve">Image</ns>
      <ns id="7" xml:space="preserve">Image talk</ns>
    </namespacealiases>
  </query>
</api>

The XML dump should be updated to include this important metadata about
namespaces.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to