On Wed, Nov 26, 2008 at 6:08 AM, Platonides <[EMAIL PROTECTED]> wrote: > Gregory Maxwell wrote: >> On Tue, Nov 25, 2008 at 5:31 PM, Platonides wrote: >> [snip] >>> Getting hits to the detail will allow to check that the filters are >>> right. And how many different UA headers we may get? 50, 80, 100? It's >>> perfectly acceptable. >> >> On Tue, Nov 25, 2008 at 5:52 PM, Marco Schuster wrote: >>> I'd basically think of 300 different UAs, but that shouldn't be a >>> major problem to handle, I think. >> >> Only counting 1:100 JS executing browsers hitting enwp there were >> 78,033 unique user agent strings yesterday. >> >> This is due to all the weird crap that gets thrown into the strings >> which takes me back to my original post. > > Sometimes to the point of making almost unique to some machines > http://meta.wikimedia.org/w/index.php?title=Vandalism_reports&diff=prev&oldid=1037392#New_post-.22.26.22_partial_article_erasing_bot > >> Really. Making a manual mapping will not work. > > Not neccessarily manual but I thought it was a number easier to abstract > and review. > > Could you share the list of headers?
If you'd like to try writing a scrubber I'd be glad to run it and give you feedback. If you need some examples of weird agents, I can make some for you. _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
