Re: [Toolserver-l] Little Project Idea

Lars Dɪᴇᴄᴋᴏᴡ 迪拉斯 Sat, 27 Jun 2009 00:41:14 -0700

The tools exist. Ample documentation exists. Both programmatic 
interfaces and easy form-based interfaces exist.


Screen scraping still happens not only because of laziness but also 
because the correct way is not promoted. For example, if I access the 
English Wikipedia main page with libwww-perl (a banned UA), the 
response body says (among other things):

    Our servers are currently experiencing a technical problem. This is
    probably temporary and should be fixed soon. Please _try again_ in a
    few minutes.

This is of course bullshit of the highest degree. It's certainly a 
permanent problem, and no amount of retrying will get me around the 
ban. The response body should say something like this:

    Screen scraping is forbidden as it causes undue burden on the
    infrastructure and servers. Use the export feature to parse single
    pages, use the dump feature to parse a whole wiki.

    http://mediawiki.org/wiki/Special:Export
    http://en.wikipedia.org/wiki/WP:Export

    http://download.wikimedia.org/
    http://en.wikipedia.org/wiki/WP:Download

Can anyone with access to the appropriate bugtracker file a bug on this, 
please?

signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Toolserver-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/toolserver-l

Re: [Toolserver-l] Little Project Idea

Reply via email to