On Tue, May 3, 2016 at 2:43 AM, Max Semenik <[email protected]> wrote:

> At this point, I would say that everybody who screen-scrapes saw it coming
> and breaking them is a good thing as sometimes, lessons just have to be
> learned.
>

There aren't many options other than content-scraping if you want to
transform Wikipedia articles into some semblance of structured data. We
even do it ourselves, for media metadata (and use an XML parser for it, as
PHP doesn't offer much in the way of parsing HTML5, so outputting
HTML5-style empty tags might break it - although IIRC there is a hack to
work around that as file pages can contain ill-formed HTML anyway).
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to