I'm not 100% convinced that the UA requirement is helpful, for two reasons:

1) Lots of requests will have default like "PHP" or "Python/urllib" or
whatever from the tool they used to build their bot. These aren't helpful
either as they contain no of how to get in touch.

2) It's trivial to work around the requirement for a non-blank UA by
setting one of the above, or worse -- cut-n-pasting the UA string from a
browser. If someone hacks this up real quick while testing, they may never
bother putting in contact information when their bot moves from a handful
of requests to gazillions.

Auto-throttling super-high-rate API clients (by IP/IP group) and giving
them an explicit "You really should contact us and, better yet, make it
possible for us to contact you" message might be nice.


We may want to seriously think about some sort of API key system... not
necessarily as mandatory for access (we love freedom and convenience!) but
perhaps as the way you get around being throttled for too many accesses.
This would give us a structured way of storing their contact information,
which might be better than unstructured names or addresses in the UA.

Does it make sense to tell people "log in to your bot's account with OAuth"
or is that too much of a pain in the ass versus "add this one parameter to
your requests with your key"? :)

-- brion


On Tue, Sep 1, 2015 at 10:23 AM, Oliver Keyes <[email protected]> wrote:

> Awesome; thanks for the analysis, Krinkle.
>
> Do we want to change this behaviour? From my point of view the answer
> is 'yes, not setting any kind of user agent is a violation of our API
> etiquette and we should be taking steps to alert people that it is'
> but if other people have different perspectives on this I'd love to
> hear them.
>
> On 1 September 2015 at 13:18, Krinkle <[email protected]> wrote:
> > I've confirmed just now that whatever requirement there was, it doesn't
> seem to be in effect.
> >
> > Both omitting the header entirely, sending it with empty string, and
> sending
> > with "-"; – all three result in a response from the MediaWiki API.
> >
> > $ curl -A '' --include -v '
> https://en.wikipedia.org/w/api.php?action=query&format=json' <
> https://en.wikipedia.org/w/api.php?action=query&format=json'>
> >> GET /w/api.php?action=query&format=json HTTP/1.1
> >> Host: en.wikipedia.org
> >> Accept: */*
> > < HTTP/1.1 200 OK
> > ..
> > {"batchcomplete":""}
> >
> >
> > $ curl -A '-' --include -v '
> https://en.wikipedia.org/w/api.php?action=query&format=json' <
> https://en.wikipedia.org/w/api.php?action=query&format=json'>
> >> GET /w/api.php?action=query&format=json HTTP/1.1
> >> User-Agent: -
> >> Host: en.wikipedia.org <http://en.wikipedia.org/>
> >> Accept: */*
> > < HTTP/1.1 200 OK
> > ..
> > {"batchcomplete":""}
> >
> > In the past (2012?) these were definitely being blocked. (Ran into it
> from time to time on Toolserver)
> > It seems php file_get_contents('http://...api..' <http://...api..'>) is
> also working fine now,
> > without having to init_set a user_agent value first.
> >
> > -- Krinkle
> > _______________________________________________
> > Wikitech-l mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
>
> --
> Oliver Keyes
> Count Logula
> Wikimedia Foundation
>
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to