https://bugzilla.wikimedia.org/show_bug.cgi?id=18694
Summary: Spanish wikipedia XML dump problems
Product: Wikimedia
Version: unspecified
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: Normal
Component: Downloads
AssignedTo: [email protected]
ReportedBy: [email protected]
I downloaded two sequential Spanish wikipedia XML dump files
(eswiki-20090504-pages-articles.xml.bz2 and before that
eswiki-20090421-pages-articles.xml.bz2). When I imported the file into wikitaxi
it showed a strange error on a large number of pages: the titles and the
content of the pages were mixed-up, that is, the title would be something and
the text itself would obviously be from a different page (or it would be a
combination of two pages). So I looked into the original XML file itself and
this is what I found, for example:
<page>
<title>Gómez Plata</title>
<id>454035</id>
<revision>
<id>25156038</id>
<timestamp>2009-03-28T06:38:04Z</timestamp>
<contributor>
<username>SajoR</username>
<id>130444</id>
</contributor>
<minor />
<comment>leve mejora</comment>
<text xml:space="preserve">'''Montserrat Domínguez''' ([[Madrid]],
[[1963]]) es una [[periodismo|periodista]] [[España|española]].
Considera que la primera obligación de un periodista es ser crítico con el
poder y es optimista respecto a la situación actual del periodismo. Su trabajo
le ofrece, en su opinión, "un motor de vida".
Es aficionada a la [[lectura]] y a los viajes.
== Biografía ==
Estudió [[Ciencias de la Información]] por la [[Universidad Complutense de
Madrid]]. Posteriormente cursó un Master en Periodismo por la [[Universidad de
Columbia]].
So the title of the page is Gómez Plata (a municipality in Colombia), but the
page is about a Spanish journalist.
This didn't happen when I downloaded other wikipedia dumps (en, de, nl, sv).
Could someone please look into this problem? Thank you.
--
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l