On Wed, Nov 26, 2008 at 6:08 AM, Platonides <[EMAIL PROTECTED]> wrote:
> Gregory Maxwell wrote:
>> On Tue, Nov 25, 2008 at 5:31 PM, Platonides wrote:
>> [snip]
>>> Getting hits to the detail will allow to check that the filters are
>>> right. And how many different UA headers we may get? 50, 80, 100? It's
>>> perfectly acceptable.
>>
>> On Tue, Nov 25, 2008 at 5:52 PM, Marco Schuster wrote:
>>> I'd basically think of 300 different UAs, but that shouldn't be a
>>> major problem to handle, I think.
>>
>> Only counting 1:100 JS executing browsers hitting enwp there were
>> 78,033 unique user agent strings yesterday.
>>
>> This is due to all the weird crap that gets thrown into the strings
>> which takes me back to my original post.
>
> Sometimes to the point of making almost unique to some machines
> http://meta.wikimedia.org/w/index.php?title=Vandalism_reports&diff=prev&oldid=1037392#New_post-.22.26.22_partial_article_erasing_bot
>
>> Really. Making a manual mapping will not work.
>
> Not neccessarily manual but I thought it was a number easier to abstract
> and review.
>
> Could you share the list of headers?

If you'd like to try writing a scrubber I'd be glad to run it and give
you feedback.   If you need some examples of weird agents, I can make
some for you.

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to