On Tue, May 3, 2016 at 4:34 PM, Gergo Tisza <[email protected]> wrote:
>
> There aren't many options other than content-scraping if you want to
> transform Wikipedia articles into some semblance of structured data. We
> even do it ourselves, for media metadata (and use an XML parser for it
>

Actually the XML parser has been replaced with DOMDocument a while ago,
which can handle HTML5 fine. But the point stands: HTML scraping is hardly
an unusual requirement for reusers of our content.
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to