On Friday 04 May 2018 03:49 AM, Bartosz Dziewoński wrote:
> On 2018-05-03 20:54, Aidan Hogan wrote:
>> I am wondering what is the fastest/best way to get a local dump of
>> English Wikipedia in HTML? We are looking just for the current
>> versions (no edit history) of articles for the purposes of a research
>> project.
> 
> The Kiwix project provides HTML dumps of Wikipedia for offline reading:
> http://www.kiwix.org/downloads/
> 

In case you need pure HTML and not the ZIM file format, you could check
out mwoffliner[1], the tool used to generate ZIM files. It dumps HTML
files locally before generating the ZIM file. Though, HTML is an
intermediary for the tool it could be held back if you wish. See [2] for
more information about what options the tool accepts.

I'm not sure if it's possible to instruct the tool to stop immediately
after the dumping of the pages thus avoiding the creation of the ZIM
file altogether. But you could work around it by perusing the 'verbose'
output (turned on through the '--verbose' option) of the tool to
identify when dumping has been completed and stop it manually.

In case of any doubts about using the tool, feel free to reach out.

References:
[1]: https://github.com/openzim/mwoffliner
[2]: https://github.com/openzim/mwoffliner/blob/master/lib/parameterList.js


-- 
Sivaraam

QUOTE:

“The most valuable person on any team is the person who makes everyone
else on the team more valuable, not the person who knows the most.”

      - Joel Spolsky


Sivaraam?

You possibly might have noticed that my signature recently changed from
'Kaartic' to 'Sivaraam' both of which are parts of my name. I find the
new signature to be better for several reasons one of which is that the
former signature has a lot of ambiguities in the place I live as it is a
common name (NOTE: it's not a common spelling, just a common name). So,
I switched signatures before it's too late.

That said, I won't mind you calling me 'Kaartic' if you like it [of
course ;-)]. You can always call me using either of the names.


KIND NOTE TO THE NATIVE ENGLISH SPEAKER:

As I'm not a native English speaker myself, there might be mistaeks in
my usage of English. I apologise for any mistakes that I make.

It would be "helpful" if you take the time to point out the mistakes.

It would be "super helpful" if you could provide suggestions about how
to correct those mistakes.

Thanks in advance!

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to