Re: [Wiki-research-l] Kill the bots

2014-05-21 Thread Oliver Keyes
Okay. Methodology: *take the last 5 days of requestlogs; *Filter them down to text/html requests as a heuristic for non-API requests; *Run them through the UA parser we use; *Exclude spiders and things which reported valid browsers; *Aggregate the user agents left; *??? *Profit It looks like

Re: [Wiki-research-l] Kill the bots

2014-05-21 Thread Scott Hale
Thank you, Oliver, This is really interesting and gives some credibility to the idea that the ability to track API/non-API edits could address the bot problem in part, but definitely could miss some bots. Thank you very much for your time to check this and share the results. Anyone think it would

Re: [Wiki-research-l] Kill the bots

2014-05-20 Thread Oliver Keyes
I think a *lot* of them use the API, but I don't know off the top of my head if it's *all* of them. If only we knew somebody who has spent the last 3 months staring into the cthulian nightmare of our request logs and could look this up... More seriously; drop me a note off-list so that I can try

Re: [Wiki-research-l] Kill the bots

2014-05-19 Thread WereSpielChequers
If your bot is only running automated reports in its own userspace then it doesn't need a bot flag. But it probably wont be a very active bot so may not be a problem for your stats On the English language wikipedia you are going to be fairly close if you exclude all accounts which currently have

Re: [Wiki-research-l] Kill the bots

2014-05-19 Thread Oliver Keyes
That would cover most of them, but runs into the problem of you're only including the unauthorised bots written poorly enough that we've caught the operator ;). It seems like this would be a useful topic for some piece of method-comparing research, if anyone is looking for paper ideas. On 19 May

Re: [Wiki-research-l] Kill the bots

2014-05-19 Thread Federico Leva (Nemo)
Brian Keegan, 18/05/2014 18:10: Is there a way to retrieve a canonical list of bots on enwiki or elsewhere? A Bots.csv list exists. https://meta.wikimedia.org/wiki/Wikistat_csv In general: please edit https://meta.wikimedia.org/wiki/Research:Identifying_bot_accounts Nemo

Re: [Wiki-research-l] Kill the bots

2014-05-19 Thread Brian Keegan
Thanks for all the references and excellent advice so far! I've looked into the Hale Anti-Bot Method™, but because I've sampled my corpus on articles (based on category co-membership), the resulting groupby users gives these semi-automated users more normal distributions since their other

Re: [Wiki-research-l] Kill the bots

2014-05-19 Thread Ann Samoilenko
the Hale Anti-Bot Method™ That's a good one. =) I'm a big fan of Scott's method I second that. Again, great paper, Scott! On Mon, May 19, 2014 at 5:31 PM, Aaron Halfaker aaron.halfa...@gmail.comwrote: Another thought I had was that because many semi-automated tools such as Twinkle and

Re: [Wiki-research-l] Kill the bots

2014-05-19 Thread Scott Hale
Thanks all for the comments on my paper, and even more thanks to everyone sharing these super helpful ideas on filtering bots: this is why I love the Wikipedia research committee. I think Oliver is definitely right that this would be a useful topic for some piece of method-comparing research,

[Wiki-research-l] Kill the bots

2014-05-18 Thread Brian Keegan
Is there a way to retrieve a canonical list of bots on enwiki or elsewhere? I'm interested in omitting automated revisions (sorry Stuart!) for the purposes of building co-authorship networks. Grabbing everything under 'Category:All Wikipedia bots' excludes some major ones like SmackBot, Cydebot,

Re: [Wiki-research-l] Kill the bots

2014-05-18 Thread Amir E. Aharoni
People whose last name is Abbot will be discriminated. And a true story: A prominent human Catalan Wikipedia editor whose name is PauCabot skewed the results of an actual study. So don't trust just the user names. בתאריך 18 במאי 2014 19:34, מאת Andrew G. West west.andre...@gmail.com: User name

Re: [Wiki-research-l] Kill the bots

2014-05-18 Thread Scott Hale
Very helpful, Lukas, I didn't know about the logging table. In some recent work [1] I found many users that appeared to be bots but whose edits did not have the bot flag set. My approach was to exclude users who didn't have a break of more than 6 hours between edits over the entire month I was

Re: [Wiki-research-l] Kill the bots

2014-05-18 Thread Brian Keegan
How does one cite emails in ACM proceedings format? :) On Sunday, May 18, 2014, R.Stuart Geiger sgei...@gmail.com wrote: Tsk tsk tsk, Brian. When the revolution comes, bot discriminators will get no mercy. :-) But seriously, my tl;dr: instead of asking if an account is or isn't a bot, ask