Re: [Wikitech-l] Getting a local dump of Wikipedia in HTML

2018-05-13 Thread Aidan Hogan
Hi all, Many thanks for all the pointers! In the end we wrote a small client to grab documents from RESTBase (https://www.mediawiki.org/wiki/RESTBase) as suggested by Neil. The HTML looks perfect, and with the generous 200 requests/second limit (which we could not even manage to reach with

Re: [Wikitech-l] Getting a local dump of Wikipedia in HTML

2018-05-08 Thread Kaartic Sivaraam
On Tuesday 08 May 2018 05:53 PM, Kaartic Sivaraam wrote: > On Friday 04 May 2018 03:49 AM, Bartosz Dziewoński wrote: >> On 2018-05-03 20:54, Aidan Hogan wrote: >>> I am wondering what is the fastest/best way to get a local dump of >>> English Wikipedia in HTML? We are looking just for the current

Re: [Wikitech-l] Getting a local dump of Wikipedia in HTML

2018-05-08 Thread Kaartic Sivaraam
On Friday 04 May 2018 03:49 AM, Bartosz Dziewoński wrote: > On 2018-05-03 20:54, Aidan Hogan wrote: >> I am wondering what is the fastest/best way to get a local dump of >> English Wikipedia in HTML? We are looking just for the current >> versions (no edit history) of articles for the purposes of

Re: [Wikitech-l] Getting a local dump of Wikipedia in HTML

2018-05-03 Thread Neil Patel Quinn
Also, for the curious, the request for dedicated HTML dumps is tracked in this Phabricator task: https://phabricator.wikimedia.org/T182351 On Thu, 3 May 2018 at 15:19, Bartosz Dziewoński wrote: > On 2018-05-03 20:54, Aidan Hogan wrote: > > I am wondering what is the

Re: [Wikitech-l] Getting a local dump of Wikipedia in HTML

2018-05-03 Thread Bartosz Dziewoński
On 2018-05-03 20:54, Aidan Hogan wrote: I am wondering what is the fastest/best way to get a local dump of English Wikipedia in HTML? We are looking just for the current versions (no edit history) of articles for the purposes of a research project. The Kiwix project provides HTML dumps of

Re: [Wikitech-l] Getting a local dump of Wikipedia in HTML

2018-05-03 Thread Neil Patel Quinn
Hey Aidan! I would suggest checking out RESTBase ( https://www.mediawiki.org/wiki/RESTBase), which offers an API for retrieving HTML versions of Wikipedia pages. It's maintained by the Wikimedia Foundation and used by a number of production Wikimedia services, so you can rely on it. I don't

Re: [Wikitech-l] Getting a local dump of Wikipedia in HTML

2018-05-03 Thread Aidan Hogan
Hi Fae, On 03-05-2018 16:18, Fæ wrote: On 3 May 2018 at 19:54, Aidan Hogan wrote: Hi all, I am wondering what is the fastest/best way to get a local dump of English Wikipedia in HTML? We are looking just for the current versions (no edit history) of articles for the

Re: [Wikitech-l] Getting a local dump of Wikipedia in HTML

2018-05-03 Thread
On 3 May 2018 at 19:54, Aidan Hogan wrote: > Hi all, > > I am wondering what is the fastest/best way to get a local dump of English > Wikipedia in HTML? We are looking just for the current versions (no edit > history) of articles for the purposes of a research project. > >

[Wikitech-l] Getting a local dump of Wikipedia in HTML

2018-05-03 Thread Aidan Hogan
Hi all, I am wondering what is the fastest/best way to get a local dump of English Wikipedia in HTML? We are looking just for the current versions (no edit history) of articles for the purposes of a research project. We have been exploring using bliki [1] to do the conversion of the source