Sorry, now correctly cross posted. Emmanuel -------- Original Message -------- Subject: WMF XML dump title case problem Date: Sun, 26 Jun 2011 17:07:19 +0200 From: Emmanuel Engelhart <[email protected]> To: Mailing list for Wikimedia CH <[email protected]>, [email protected]
Hi Titles should be stored in the table "page" with a first letter uppercased. http://en.wikipedia.org/wiki/Wikipedia:Naming_conventions_%28technical_restrictions%29#Lower_case_first_letter Unfortunately, it seems that we have XML dumps (and consequently mwdumper generated SQL) containing titles with a first letter lowercased. For example: $wget http://download.wikimedia.org/mywiktionary/20110617/mywiktionary-20110617-pages-articles.xml.bz2 $bzip2 -d -c mywiktionary-20110617-pages-articles.xml.bz2 | grep "<title>"| grep tationery | more <title>stationery</title> <title>stationery shop</title> Is that a bug? Regards Emmanuel _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
