Hi all,
Many thanks for all the pointers! In the end we wrote a small client to
grab documents from RESTBase (https://www.mediawiki.org/wiki/RESTBase)
as suggested by Neil. The HTML looks perfect, and with the generous 200
requests/second limit (which we could not even manage to reach with
On Tuesday 08 May 2018 05:53 PM, Kaartic Sivaraam wrote:
> On Friday 04 May 2018 03:49 AM, Bartosz Dziewoński wrote:
>> On 2018-05-03 20:54, Aidan Hogan wrote:
>>> I am wondering what is the fastest/best way to get a local dump of
>>> English Wikipedia in HTML? We are looking just for the current
On Friday 04 May 2018 03:49 AM, Bartosz Dziewoński wrote:
> On 2018-05-03 20:54, Aidan Hogan wrote:
>> I am wondering what is the fastest/best way to get a local dump of
>> English Wikipedia in HTML? We are looking just for the current
>> versions (no edit history) of articles for the purposes of
Also, for the curious, the request for dedicated HTML dumps is tracked in
this Phabricator task: https://phabricator.wikimedia.org/T182351
On Thu, 3 May 2018 at 15:19, Bartosz Dziewoński wrote:
> On 2018-05-03 20:54, Aidan Hogan wrote:
> > I am wondering what is the
On 2018-05-03 20:54, Aidan Hogan wrote:
I am wondering what is the fastest/best way to get a local dump of
English Wikipedia in HTML? We are looking just for the current versions
(no edit history) of articles for the purposes of a research project.
The Kiwix project provides HTML dumps of
Hey Aidan!
I would suggest checking out RESTBase (
https://www.mediawiki.org/wiki/RESTBase), which offers an API for
retrieving HTML versions of Wikipedia pages. It's maintained by the
Wikimedia Foundation and used by a number of production Wikimedia services,
so you can rely on it.
I don't
Hi Fae,
On 03-05-2018 16:18, Fæ wrote:
On 3 May 2018 at 19:54, Aidan Hogan wrote:
Hi all,
I am wondering what is the fastest/best way to get a local dump of English
Wikipedia in HTML? We are looking just for the current versions (no edit
history) of articles for the
On 3 May 2018 at 19:54, Aidan Hogan wrote:
> Hi all,
>
> I am wondering what is the fastest/best way to get a local dump of English
> Wikipedia in HTML? We are looking just for the current versions (no edit
> history) of articles for the purposes of a research project.
>
>
Hi all,
I am wondering what is the fastest/best way to get a local dump of
English Wikipedia in HTML? We are looking just for the current versions
(no edit history) of articles for the purposes of a research project.
We have been exploring using bliki [1] to do the conversion of the
source