[email protected] wrote:
> Hi,
> 
> Thanks for responding. let me try to be a little bit more clear.
> 
> I am primarily interested in extracting, what image is linked from the
> infobox of an article (if there is a infobox in the article page).
> Initially i thought of  parsing the xml for this info, but then after
> looking around a bit, I felt it might be easier and faster to get the
> wikipedia data loaded into database. So that I can play around with the
> data a lot more.
> 
> I am working on my lab machine, where already some web applications are
> running. Since MediaWiki installation mentioned that I need to change some
> PHP settings, I was a little wary about it. Also I dont have root access
> to the lab machines, but I can ask my lab admin to do stuff for me when i
> want something.

You don't need to change php settings. Unless you have a really esoteric
php config Mediawiki will work fine.


> My understanding is that I should import the data even if I install
> MediaWiki. And it is primarily for those who want to view the data in a
> wiki format. So I decided to go only with the database. I didnt use
> importDump.php, as it was suggested to be very slow and not advisable for
> large dumps in http://meta.wikimedia.org/wiki/Data_dumps. I wouldnt mind
> installing MediaWiki if that would help me import the data easily.

If you just want to manually parse the wikitext of the articles, don't
import into a bd. Feed your program directly from the XML. It will be
way faster.
In the other hand, if you want mediawiki to do something with it, you'll
need a mediawiki install.


> I created the database using the database layout in
> http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/tables.sql?view=markup
> 
> This time I downloaded a different version of the pages-articles.xml.bz2
> dump from http://download.wikimedia.org/enwiki/20090618/  and tried
> importing using mwdumper.jar.
> 
> $ java -jar ../../lib/mwdumper.jar --format=sql:1.5
> enwiki-20090618-pages-articles.xml | mysql -f -u root
> --default-character-set=utf-8 wikipedia
> 
> 
> When I issued the above command the importing process crashes after a
> while with the following error message,
> 
> 1,427,000 pages (705.771/sec), 1,427,000 revs (705.771/sec)
> 1,428,000 pages (705.879/sec), 1,428,000 revs (705.879/sec)
> Exception in thread "main" java.lang.IllegalArgumentException: Invalid
> contributor


> I also tried the same with mwimport.pl , it crashed with a similar error
> saying "invalid contributor".

You're right. It's bug 18328. They don't support. rev_deleted.


_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to