[MediaWiki-l] New "Between the Brackets" episode: Remco de Boer

2018-07-24 Thread Yaron Koren
Hi everyone, Episode #13 of the MediaWiki podcast "Between the Brackets" has been released; this one is an interview with Remco de Boer of the Dutch consulting company ArchiXL, which makes significant use of MediaWiki in its projects. You can find and listen to the episode here:

Re: [MediaWiki-l] How to convert WikiText to Plain Tex

2018-07-24 Thread Erik Bernhardson
You can source that from the cirrussearch dumps, which contain the text already cleaned up. The python looks something like: import json from itertools import zip_longest from pprint import pprint import requests import zlib def get_gzip_stream(url): with requests.get(url, stream=True) as

[MediaWiki-l] How to convert WikiText to Plain Tex

2018-07-24 Thread Nikhil Prakash
Hi There, I'm searching for some efficient way to convert the WikiText of the downloaded data dumps(in XML) to plain text. I basically need plain text of each and every revision of Wikipedia articles. Therefore, it would be very helpful if you can tell me about some library or some piece of