[Wiki-research-l] Wikimedia-l discussion on leadership in Wikimedia
Researchers and EE specialists, your thoughts would be appreciated on this. I started the thread only on Wikimedia-l to keep the discussion consolidated in one place. http://lists.wikimedia.org/pipermail/wikimedia-l/2014-May/071811.html Thanks, Pine ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] [Wikimedia-l] Leadership, Wikimedia-style
hi Pine, this is an excellent point, and I believe there are definitely too few systematic studies on the topic, as well as targeted programs. blatant promotion mode on In my book, Common Knowledge? An Ethnography of Wikipedia, which has left the press last week, I have a whole chapter (Between Anarchy and Bureaucracy: Wikimedia Governance) dedicated to issues of governance and internal leadership. http://www.sup.org/book.cgi?id=24010 Unfortunately, Google Books preview has most of the pages limited (and Stanford University Press is not too keen on open access of their publications, sadly). http://books.google.pl/books?id=hBpuAwAAQBAJlpg=PA178dq=jemielniak%20common%20knowledge%20leadershiphl=plpg=PP1#v=snippetq=between%20anarchyf=false /blatant promotion mode off best, dariusz pundit On Mon, May 19, 2014 at 7:12 AM, ENWP Pine deyntest...@hotmail.com wrote: Hi all, I've heard the word leadership used a lot in WMF, synonymously with management in my experience. That makes sense in a somewhat hierarchical organization like WMF, although this model has received some criticism from the community for allegedly excessive top-down thinking. I'm not familiar enough with the culture in the WMF Office to comment about its strengths and weaknesses, but I would like to ask questions about leadership in the community. In the community, which is diffuse and where roles are highly flexible, there have been some studies done done about leadership but the ones I know about usually focus on hierarchies within the community, especially how people get chosen for administrator roles on-wiki. As we are thinking about our online culture, we can be thinking about movement leadership. Who are the leaders, how are they trained, how are they selected, what do they do, what makes them effective, and how can they be given ongoing support and training? I think many of us would agree that adminship and leadership are not always synonymous, and there are many ways that people exercise leadership in non-hierarchical ways. I hear frequently about stress from members of English Wikipedia's Arbcom, and I hope WMF is thinking about how to train and support people who get chosen for such visible, important, and often stressful volunteer roles. I would also like to point out that Wikimedia is developing training materials for leaders of chapters and programs. Is there anyone at WMF who is taking a holistic view of community leadership and how to understand, train and support it in ways that support the strategic plan goals? Training that might be relevant could include how to create friendly spaces online, resolve online conflicts, engage in cross-cultural communication, encourage strategic thinking, influence change, and maintain morale. I think a series of five-minute training modules could be helpful for online and offline volunteers, along with dedicating some Program Evaluation or Research time to understanding leadership in the non-hierarchical community. These initiatives could help with encouraging teamwork and collaboration online by influencing and training leaders. I would also be interested in hearing about how WMF thinks about leadership internally, since there seems to be some community feeling that WMF's thinking about leadership is incompatible with the community's. I don't have an opinion but I would like to be more informed, and hopefully encourage WMF to think about how the organization as a whole interacts with the community. Thanks, Pine ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines wikimedi...@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe -- __ dr hab. Dariusz Jemielniak profesor zarządzania kierownik katedry Zarządzania Międzynarodowego i centrum badawczego CROW Akademia Leona Koźmińskiego http://www.crow.alk.edu.pl członek Akademii Młodych Uczonych Polskiej Akademii Nauk ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Kill the bots
If your bot is only running automated reports in its own userspace then it doesn't need a bot flag. But it probably wont be a very active bot so may not be a problem for your stats On the English language wikipedia you are going to be fairly close if you exclude all accounts which currently have a bot flag, this list of former botshttps://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits/Unflagged_bots (I occasionally maintain this in order for the list of editors by edit count to work, as of a couple of weeks ago when I last checked I believe it to be a comprehensive list of retired bots with 6,000 or more edits), and perhaps the individual with a very high edit count who has in the past been blocked for running unauthorised bots on his user account. (I won't name that account on list, but since it also contains a large number of manual edits, the true answer is that you can't get an exact divide between bots and non bots by classifying every account as either a bot or a human). If you are minded to treat all accounts containing the word syllable bot as bots, then you might want to tweak that to count anyone on thesehttps://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits two listshttps://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits/5001%E2%80%931as human even if their name includes bot. I check those lists occasionally and make sure that the only bots included are human editors. On 18 May 2014 20:33, R.Stuart Geiger sgei...@gmail.com wrote: Tsk tsk tsk, Brian. When the revolution comes, bot discriminators will get no mercy. :-) But seriously, my tl;dr: instead of asking if an account is or isn't a bot, ask if a set of edits are or are not automated Great responses so far: searching usernames for *bot will exclude non-bot users who were registered before the username policy change (although *Bot is a bit better), and the logging table is a great way to collect bot flags. However, Scott is right -- the bot flag (or *Bot username) doesn't signify a bot, it signifies a bureaucrat recognizing that a user account successfully went through the Bot Approval Group process. If I see an account with a bot flag, I can generally assume the edits that account makes are initiated by an automated software agent. This is especially the case in the main namespace. The inverse assumption is not nearly as easy: I can't assume that every edit made from an account *without* a bot flag was *not* an automated edit. About unauthorized bots: yes, there are a relatively small number of Wikipedians who, on occasion, run fully-automated, continuously-operating bots without approval. Complicating this, if someone is going to take the time to build and run a bot, but isn't going to create a separate account for it, then it is likely that they are also using that account to do non-automated edits. Sometimes new bot developers will run an unauthorized bot under their own account during the initial stages of development, and only later in the process will they create a separate bot account and seek formal approval and flagging. It can get tricky when you exclude all the edits from an account for being automated based on a single suspicious set of edits. More commonly, there are many more people who use automated batch tools like AutoWikiBrowser to support one-off tasks, like mass find-and-replace or category cleanup. Accounts powered by AWB are technically not bots, only because a human has to sit there and click save for every batch edit that is made. Some people will create a separate bot account for AWB work and get it approved and flagged, but many more will not bother. Then there are people using semi-automated, human-in-the-loop tools like Huggle to do vandal fighting. I find that the really hard question is whether you include or exclude these different kinds of 'cyborgs', because it really makes you think hard about what exactly you're measuring. Is someone who does a mass find-and-replace on all articles in a category a co-author of each article they edit? Is a vandal fighter patrolling the recent changes feed with Huggle a co-author of all the articles they edit when they revert vandalism and then move on to the next diff? What about somebody using rollback in the web browser? If so, what is it that makes these entities authors and ClueBot NG not an author? When you think about it, user accounts are actually pretty remarkable in that they allow such a diverse set of uses and agents to be attributed to a single entity. So when it comes to identifying automation, I personally think it is better to shift the unit of analysis from the user account to the individual edit. A bot flag lets you assume all edits from an account are automated, but you can use a range of approaches to identifying sets of automated edits from non-flagged accounts. Then I have a set of regex SQL queries in the Query Library
Re: [Wiki-research-l] Kill the bots
That would cover most of them, but runs into the problem of you're only including the unauthorised bots written poorly enough that we've caught the operator ;). It seems like this would be a useful topic for some piece of method-comparing research, if anyone is looking for paper ideas. On 19 May 2014 03:30, WereSpielChequers werespielchequ...@gmail.com wrote: If your bot is only running automated reports in its own userspace then it doesn't need a bot flag. But it probably wont be a very active bot so may not be a problem for your stats On the English language wikipedia you are going to be fairly close if you exclude all accounts which currently have a bot flag, this list of former botshttps://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits/Unflagged_bots (I occasionally maintain this in order for the list of editors by edit count to work, as of a couple of weeks ago when I last checked I believe it to be a comprehensive list of retired bots with 6,000 or more edits), and perhaps the individual with a very high edit count who has in the past been blocked for running unauthorised bots on his user account. (I won't name that account on list, but since it also contains a large number of manual edits, the true answer is that you can't get an exact divide between bots and non bots by classifying every account as either a bot or a human). If you are minded to treat all accounts containing the word syllable bot as bots, then you might want to tweak that to count anyone on thesehttps://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits two listshttps://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits/5001%E2%80%931as human even if their name includes bot. I check those lists occasionally and make sure that the only bots included are human editors. On 18 May 2014 20:33, R.Stuart Geiger sgei...@gmail.com wrote: Tsk tsk tsk, Brian. When the revolution comes, bot discriminators will get no mercy. :-) But seriously, my tl;dr: instead of asking if an account is or isn't a bot, ask if a set of edits are or are not automated Great responses so far: searching usernames for *bot will exclude non-bot users who were registered before the username policy change (although *Bot is a bit better), and the logging table is a great way to collect bot flags. However, Scott is right -- the bot flag (or *Bot username) doesn't signify a bot, it signifies a bureaucrat recognizing that a user account successfully went through the Bot Approval Group process. If I see an account with a bot flag, I can generally assume the edits that account makes are initiated by an automated software agent. This is especially the case in the main namespace. The inverse assumption is not nearly as easy: I can't assume that every edit made from an account *without* a bot flag was *not* an automated edit. About unauthorized bots: yes, there are a relatively small number of Wikipedians who, on occasion, run fully-automated, continuously-operating bots without approval. Complicating this, if someone is going to take the time to build and run a bot, but isn't going to create a separate account for it, then it is likely that they are also using that account to do non-automated edits. Sometimes new bot developers will run an unauthorized bot under their own account during the initial stages of development, and only later in the process will they create a separate bot account and seek formal approval and flagging. It can get tricky when you exclude all the edits from an account for being automated based on a single suspicious set of edits. More commonly, there are many more people who use automated batch tools like AutoWikiBrowser to support one-off tasks, like mass find-and-replace or category cleanup. Accounts powered by AWB are technically not bots, only because a human has to sit there and click save for every batch edit that is made. Some people will create a separate bot account for AWB work and get it approved and flagged, but many more will not bother. Then there are people using semi-automated, human-in-the-loop tools like Huggle to do vandal fighting. I find that the really hard question is whether you include or exclude these different kinds of 'cyborgs', because it really makes you think hard about what exactly you're measuring. Is someone who does a mass find-and-replace on all articles in a category a co-author of each article they edit? Is a vandal fighter patrolling the recent changes feed with Huggle a co-author of all the articles they edit when they revert vandalism and then move on to the next diff? What about somebody using rollback in the web browser? If so, what is it that makes these entities authors and ClueBot NG not an author? When you think about it, user accounts are actually pretty remarkable in that they allow such a diverse set of uses and agents to be attributed to a single
Re: [Wiki-research-l] Kill the bots
Brian Keegan, 18/05/2014 18:10: Is there a way to retrieve a canonical list of bots on enwiki or elsewhere? A Bots.csv list exists. https://meta.wikimedia.org/wiki/Wikistat_csv In general: please edit https://meta.wikimedia.org/wiki/Research:Identifying_bot_accounts Nemo ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Kill the bots
Thanks for all the references and excellent advice so far! I've looked into the Hale Anti-Bot Method™, but because I've sampled my corpus on articles (based on category co-membership), the resulting groupby users gives these semi-automated users more normal distributions since their other contributions are censored. In other words, I see only a fraction of these users' contributions and thus the resulting time intervals I observe are spaced farther apart (more typical) than they actually are. It's not feasible for me to get 100k+ users' histories just for the purposes of cleaning up ~6k articles' histories. Another thought I had was that because many semi-automated tools such as Twinkle and AWB leave parenthetical annotations in their revision comments, would this be a relatively inexpensive way to filter out revisions rather than users? Some caveats, I'd like to get domain experts' feedback on. I'm not expecting settled research, just input from others' experiences munging the data. 1. Is the inclusion of this markup in revision comments optional? This is a concern that some users may enable or disable it, so I may end up biasing inclusion based on users' preferences. 2. How have these flags or markup changed over time? This is a concern that Twinke/AWB/etc. may have started/stopped including flags or changed what they included over time. 3. Are there other API queries or data elsewhere I could use to identify (semi-)automated revisions? On Mon, May 19, 2014 at 10:35 AM, Federico Leva (Nemo) nemow...@gmail.comwrote: Brian Keegan, 18/05/2014 18:10: Is there a way to retrieve a canonical list of bots on enwiki or elsewhere? A Bots.csv list exists. https://meta.wikimedia.org/wiki/Wikistat_csv In general: please edit https://meta.wikimedia.org/ wiki/Research:Identifying_bot_accounts Nemo ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l -- Brian C. Keegan, Ph.D. Post-Doctoral Research Fellow, Lazer Lab College of Social Sciences and Humanities, Northeastern University Fellow, Institute for Quantitative Social Sciences, Harvard University Affiliate, Berkman Center for Internet Society, Harvard Law School b.kee...@neu.edu www.brianckeegan.com M: 617.803.6971 O: 617.373.7200 Skype: bckeegan ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Kill the bots
the Hale Anti-Bot Method™ That's a good one. =) I'm a big fan of Scott's method I second that. Again, great paper, Scott! On Mon, May 19, 2014 at 5:31 PM, Aaron Halfaker aaron.halfa...@gmail.comwrote: Another thought I had was that because many semi-automated tools such as Twinkle and AWB leave parenthetical annotations in their revision comments See Stuarts comments above. And also the queries he linked too. https://wiki.toolserver.org/view/MySQL_queries#Automated_tool_and_bot_edits It would be nice if we could get these queries in version control and share them. Maybe there is potential for building a hand-curated list of bot user_ids in version control as well. -Aaron On Mon, May 19, 2014 at 10:17 AM, Brian Keegan b.kee...@neu.edu wrote: Thanks for all the references and excellent advice so far! I've looked into the Hale Anti-Bot Method™, but because I've sampled my corpus on articles (based on category co-membership), the resulting groupby users gives these semi-automated users more normal distributions since their other contributions are censored. In other words, I see only a fraction of these users' contributions and thus the resulting time intervals I observe are spaced farther apart (more typical) than they actually are. It's not feasible for me to get 100k+ users' histories just for the purposes of cleaning up ~6k articles' histories. Another thought I had was that because many semi-automated tools such as Twinkle and AWB leave parenthetical annotations in their revision comments, would this be a relatively inexpensive way to filter out revisions rather than users? Some caveats, I'd like to get domain experts' feedback on. I'm not expecting settled research, just input from others' experiences munging the data. 1. Is the inclusion of this markup in revision comments optional? This is a concern that some users may enable or disable it, so I may end up biasing inclusion based on users' preferences. 2. How have these flags or markup changed over time? This is a concern that Twinke/AWB/etc. may have started/stopped including flags or changed what they included over time. 3. Are there other API queries or data elsewhere I could use to identify (semi-)automated revisions? On Mon, May 19, 2014 at 10:35 AM, Federico Leva (Nemo) nemow...@gmail.com wrote: Brian Keegan, 18/05/2014 18:10: Is there a way to retrieve a canonical list of bots on enwiki or elsewhere? A Bots.csv list exists. https://meta.wikimedia.org/wiki/Wikistat_csv In general: please edit https://meta.wikimedia.org/ wiki/Research:Identifying_bot_accounts Nemo ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l -- Brian C. Keegan, Ph.D. Post-Doctoral Research Fellow, Lazer Lab College of Social Sciences and Humanities, Northeastern University Fellow, Institute for Quantitative Social Sciences, Harvard University Affiliate, Berkman Center for Internet Society, Harvard Law School b.kee...@neu.edu www.brianckeegan.com M: 617.803.6971 O: 617.373.7200 Skype: bckeegan ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l -- - Kind regards, Ann Samoilenko, MSc Oxford Internet Institute University of Oxford Adventures can change your life e-mail: ann.samoile...@gmail.com Skype: ann.samoilenko ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Kill the bots
Thanks all for the comments on my paper, and even more thanks to everyone sharing these super helpful ideas on filtering bots: this is why I love the Wikipedia research committee. I think Oliver is definitely right that this would be a useful topic for some piece of method-comparing research, if anyone is looking for paper ideas. Citation goldmine as one friend called it, I think. This won't address edit logs to date, but do we know if most bots and automated tools use the API to make edits? If so, would it be feasibility to add a flag to each edit as to whether it came through the API or not. This won't stop determined users, but might be a nice way to identify cyborg edits from those made manually by the same user for many of the standard tools going forward. The closest thing I found in the bug tracker is [1], but it doesn't address the issue of 'what is a bot' which this thread has clearly shown is quite complex. An API-edit vs. non-API edit might be a way forward unless there are automated tools/bots that don't use the API. 1. https://bugzilla.wikimedia.org/show_bug.cgi?id=11181 Cheers, Scott ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l