Hello, all.

For (yet) unknown reasons, last complete dump files (pages-meta-history.xml) in 
some languages are flawed. Certain revision items are missing info about 
rev_user. Even though there are only 3 or 4 of that kind, this is enough to 
mess up either the parsing process or the later SQL load into the DB.

So far, the last 3 dumps of DE Wikipedia and 20090603 from FR Wikipedia have 
presented this error.

I have updated both WikiXRay parsers:
http://meta.wikimedia.org/wiki/WikiXRay_parser
http://meta.wikimedia.org/wiki/WikiXRay_parser_research

They now probe whether the parsed revision item is complete or not, before 
creating the SQL. If it's flawed, its omitted and logged into an error file for 
later inspection.

Regards,

Felipe.


      


_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to