Re: [Wikitech-l] thumb generation

Gergo Tisza Mon, 14 Sep 2015 17:21:07 -0700

On Mon, Sep 14, 2015 at 4:49 PM, Platonides <[email protected]> wrote:


> You know it will fail for all kind of images included through templates
> (particularly infoboxes), right?


Indeed, it is not possible to find out what thumbnails are used by a page
without actually parsing it. Your best bet is to wait until Parsoid dumps
become available (T17017 <https://phabricator.wikimedia.org/T17017>), then
go through those with an XML parser and extract the thumb URLs. That's
still slow but not as slow as the MediaWiki parser. (Or you can try to find
a regexp which matches thumbnail URLs but we all know what happens
<http://stackoverflow.com/a/1732454/323407> when you use a regexp to parse
HTML.) After that, just throw those URLs at the 404 handler.
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] thumb generation

Reply via email to