Re: [Wikitech-l] Html dump for Wikipedia

Bjoern Hoehrmann Fri, 02 Dec 2011 14:16:11 -0800

* Khalida BEN SIDI AHMED wrote:
>I need an Html dump of Wikipedia because I have written a java code which
>extract text from an html content and I would like to apply it on this
>dump. In fact I need to extract the first sentence of a list of articles
>(<200) and I don' know how to do it on other dumps. If you have any idea of
>other solutions, I will be pleased if you share them with me.


If you just need a few articles, you can simply use the online version.
There are any number of tools that would help you to batch the requests
without hitting the server too much, `wget` and `curl` are popular ones.
-- 
Björn Höhrmann · mailto:[email protected] · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Html dump for Wikipedia

Reply via email to