On Tue, May 3, 2016 at 4:34 PM, Gergo Tisza <[email protected]> wrote: > > There aren't many options other than content-scraping if you want to > transform Wikipedia articles into some semblance of structured data. We > even do it ourselves, for media metadata (and use an XML parser for it >
Actually the XML parser has been replaced with DOMDocument a while ago, which can handle HTML5 fine. But the point stands: HTML scraping is hardly an unusual requirement for reusers of our content. _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
