Re: [Wikitech-l] User agent policy for bots

Brad Jorsch (Anomie) Mon, 14 Jul 2014 10:05:36 -0700

Note this reply represents my own views, but does not represent an official
WMF position.

On Sun, Jul 13, 2014 at 4:25 PM, John Mark Vandenberg <[email protected]>
wrote:

> It would be good to know the answer to whether the username is logged
> against API requests.  It seems like a very important piece of
> information which should be visible in server ops logging of API
> usage.
>

The API request log does record usernames. And doesn't contain user agents,
for that matter.

But my guess is that at least some of the types of problems Ops would be
concerned with are in different log files that probably do not contain
usernames but do contain user agents.

> username is easy, if it is needed.

I would include username. The only harm is a few extra bytes per request.

> pywiki requiring bot operators provide an email address is technically
> easy, but I suspect it isnt going to be very successful or
> appreciated, esp for non-SSL wikis, or understood as pywiki hasnt put
> this info in the user-agent since the new user-agent policy was
> introduced, so why now?
>

I don't see any particular need for email addresses if the on-wiki username
is provided. The key is "some method of contact".

> If the main source of problems is the 'large' bots, they usually run
> many tasks, and it is likely to only be a single task causing
> problems.  With these large tasks, ideally they are paused rather than
> blocked, in which case we need to introduce a standardised way to
> pause a bot.  In these cases, the user agent could mention the task
> identifier, and that identifier could be used to pause it until an
> operator has checked their email.  The 'pause' command interface could
> be IRC or user_talk, or something new based on Flow, or a API response
> warning like replag which pywikibot honours.  I appreciate Bináris'
> point that some (most?) wikis, especially smaller wikis, do not have
> 'task approval' processes with a task identifier, so this would need
> to be optional.  Large bot operators would use this feature if it
> meant that only a single task is paused rather than the bot account
> blocked.
>
> For the normal usage of pywikibot, being invoking an existing script
> which is maintained by pywikibot, we could include in the user-agent
> which script is running (e.g. move.py).
>

Including the "task name", which for pywikibot could be the script name,
seems sensible to me. Besides the stated distinguishing which script in a
multi-task bot is problematic, it would also help in determining that
multiple accounts/IPs are running the same problematic script.

I wouldn't go as far as requiring the task name to correspond to any
particular on-wiki approval, although bots on wikis with such approval
processes could well use the title of the approval page as their task name.

What user agents do the other large editing frameworks use?
>

I can tell you AnomieBOT uses "AnomieBOT 1.0 ($TASKNAME; see
[[User:$USERNAME]])". Not sure if you consider it a large editing framework.

The task names the bot uses are generally listed on the bot's userpage;
various one-off scripts I use locally will use some ad-hoc identifier, or
"no task" if I forgot to have the script set a task name.

(I should change that to start with AnomieBOT/1.0 to comply with RFC 2616,
now that I think of it)

-- 
Brad Jorsch (Anomie)
Software Engineer
Wikimedia Foundation
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] User agent policy for bots

Reply via email to