[Wiki-research-l] Wikimedia-l discussion on leadership in Wikimedia

2014-05-19 Thread ENWP Pine
Researchers and EE specialists, your thoughts would be appreciated on this. I 
started the thread only on Wikimedia-l to keep the discussion consolidated in 
one place.

http://lists.wikimedia.org/pipermail/wikimedia-l/2014-May/071811.html

Thanks,

Pine
  ___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Wikimedia-l] Leadership, Wikimedia-style

2014-05-19 Thread Dariusz Jemielniak
hi Pine,

this is an excellent point, and I believe there are definitely too few
systematic studies on the topic, as well as targeted programs.

blatant promotion mode on
 In my book, Common Knowledge? An Ethnography of Wikipedia, which has
left the press last week, I have a whole chapter (Between Anarchy and
Bureaucracy: Wikimedia Governance) dedicated to issues of governance and
internal leadership.
http://www.sup.org/book.cgi?id=24010

Unfortunately, Google Books preview has most of the pages limited (and
Stanford University Press is not too keen on open access of their
publications, sadly).
http://books.google.pl/books?id=hBpuAwAAQBAJlpg=PA178dq=jemielniak%20common%20knowledge%20leadershiphl=plpg=PP1#v=snippetq=between%20anarchyf=false
/blatant promotion mode off

 best,

dariusz pundit


On Mon, May 19, 2014 at 7:12 AM, ENWP Pine deyntest...@hotmail.com wrote:







 Hi all,

 I've heard the word leadership used a lot in WMF, synonymously with
 management in my experience. That makes sense in a somewhat hierarchical
 organization like WMF, although this model has received some criticism from
 the community for allegedly excessive top-down thinking. I'm not familiar
 enough with the culture in the WMF Office to comment about its strengths
 and weaknesses, but I would like to ask questions about leadership in the
 community.

 In the community, which is diffuse and where roles are highly flexible,
 there have been some studies done done about leadership but the ones I know
 about usually focus on hierarchies within the community, especially how
 people get chosen for administrator roles on-wiki. As we are thinking about
 our online culture, we can be thinking about movement leadership. Who are
 the leaders, how are they trained, how are they selected, what do they do,
 what makes them effective, and how can they be given ongoing support and
 training? I think many of us would agree that adminship and leadership are
 not always synonymous, and there are many ways that people exercise
 leadership in non-hierarchical ways.

 I hear frequently about stress from members of English Wikipedia's Arbcom,
 and I hope WMF is thinking about how to train and support people who get
 chosen for such visible, important, and often stressful volunteer roles.

 I would also like to point out that Wikimedia is developing training
 materials for leaders of chapters and programs.

 Is there anyone at WMF who is taking a holistic view of community
 leadership and how to understand, train and support it in ways that support
 the strategic plan goals?

 Training that might be relevant could include how to create friendly
 spaces online,
 resolve online conflicts, engage in cross-cultural communication,
 encourage strategic thinking, influence change, and maintain morale. I
 think a series of five-minute training modules could be helpful for online
 and offline volunteers, along with dedicating some Program  Evaluation or
 Research time to understanding leadership in the non-hierarchical
 community. These initiatives could help with encouraging teamwork and
 collaboration online by influencing and training leaders.

 I would also be interested in hearing about how WMF thinks about
 leadership internally, since there seems to be some community feeling
 that WMF's thinking about leadership is incompatible with the community's.
 I don't have an opinion but I would like to be more informed, and hopefully
 encourage WMF to think about how the organization as a whole interacts with
 the community.

 Thanks,

 Pine



 ___
 Wikimedia-l mailing list, guidelines at:
 https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
 wikimedi...@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
 mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe




-- 

__
dr hab. Dariusz Jemielniak
profesor zarządzania
kierownik katedry Zarządzania Międzynarodowego
i centrum badawczego CROW
Akademia Leona Koźmińskiego
http://www.crow.alk.edu.pl

członek Akademii Młodych Uczonych Polskiej Akademii Nauk
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Kill the bots

2014-05-19 Thread WereSpielChequers
If your bot is only running automated reports in its own userspace then it
doesn't need a bot flag. But it probably wont be a very active bot so may
not be a problem for your stats

On the English language wikipedia you are going to be fairly close if you
exclude all accounts which currently have a bot flag, this list of former
botshttps://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits/Unflagged_bots
(I
occasionally maintain this in order for the list of editors by edit count
to work, as of a couple of weeks ago when I last checked I believe it to be
a comprehensive list of retired bots with 6,000 or more edits), and perhaps
the individual with a very high edit count who has in the past been blocked
for running unauthorised bots on his user account. (I won't name that
account on list, but since it also contains a large number of manual edits,
the true answer is that you can't get an exact divide between bots and non
bots by classifying every account as either a bot or a human).

If you are minded to treat all accounts containing the word syllable bot as
bots, then you might want to tweak that to count anyone on
thesehttps://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits
 two 
listshttps://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits/5001%E2%80%931as
human even if their name includes bot. I check those lists
occasionally
and make sure that the only bots included are human editors.


On 18 May 2014 20:33, R.Stuart Geiger sgei...@gmail.com wrote:

 Tsk tsk tsk, Brian. When the revolution comes, bot discriminators will get
 no mercy. :-)

 But seriously, my tl;dr: instead of asking if an account is or isn't a
 bot, ask if a set of edits are or are not automated

 Great responses so far: searching usernames for *bot will exclude non-bot
 users who were registered before the username policy change (although *Bot
 is a bit better), and the logging table is a great way to collect bot
 flags. However, Scott is right -- the bot flag (or *Bot username) doesn't
 signify a bot, it signifies a bureaucrat recognizing that a user account
 successfully went through the Bot Approval Group process. If I see an
 account with a bot flag, I can generally assume the edits that account
 makes are initiated by an automated software agent. This is especially the
 case in the main namespace. The inverse assumption is not nearly as easy: I
 can't assume that every edit made from an account *without* a bot flag was
 *not* an automated edit.

 About unauthorized bots: yes, there are a relatively small number of
 Wikipedians who, on occasion, run fully-automated, continuously-operating
 bots without approval. Complicating this, if someone is going to take the
 time to build and run a bot, but isn't going to create a separate account
 for it, then it is likely that they are also using that account to do
 non-automated edits. Sometimes new bot developers will run an unauthorized
 bot under their own account during the initial stages of development, and
 only later in the process will they create a separate bot account and seek
 formal approval and flagging. It can get tricky when you exclude all the
 edits from an account for being automated based on a single suspicious set
 of edits.

 More commonly, there are many more people who use automated batch tools
 like AutoWikiBrowser to support one-off tasks, like mass find-and-replace
 or category cleanup. Accounts powered by AWB are technically not bots,
 only because a human has to sit there and click save for every batch edit
 that is made. Some people will create a separate bot account for AWB work
 and get it approved and flagged, but many more will not bother. Then
 there are people using semi-automated, human-in-the-loop tools like Huggle
 to do vandal fighting. I find that the really hard question is whether
 you include or exclude these different kinds of 'cyborgs', because it
 really makes you think hard about what exactly you're measuring. Is
 someone who does a mass find-and-replace on all articles in a category a
 co-author of each article they edit? Is a vandal fighter patrolling the
 recent changes feed with Huggle a co-author of all the articles they edit
 when they revert vandalism and then move on to the next diff? What about
 somebody using rollback in the web browser? If so, what is it that makes
 these entities authors and ClueBot NG not an author?

 When you think about it, user accounts are actually pretty remarkable in
 that they allow such a diverse set of uses and agents to be attributed to a
 single entity. So when it comes to identifying automation, I personally
 think it is better to shift the unit of analysis from the user account to
 the individual edit. A bot flag lets you assume all edits from an account
 are automated, but you can use a range of approaches to identifying sets of
 automated edits from non-flagged accounts. Then I have a set of regex SQL
 queries in the Query Library 

Re: [Wiki-research-l] Kill the bots

2014-05-19 Thread Oliver Keyes
That would cover most of them, but runs into the problem of you're only
including the unauthorised bots written poorly enough that we've caught the
operator ;). It seems like this would be a useful topic for some piece of
method-comparing research, if anyone is looking for paper ideas.


On 19 May 2014 03:30, WereSpielChequers werespielchequ...@gmail.com wrote:

 If your bot is only running automated reports in its own userspace then it
 doesn't need a bot flag. But it probably wont be a very active bot so may
 not be a problem for your stats

 On the English language wikipedia you are going to be fairly close if you
 exclude all accounts which currently have a bot flag, this list of former
 botshttps://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits/Unflagged_bots
  (I
 occasionally maintain this in order for the list of editors by edit count
 to work, as of a couple of weeks ago when I last checked I believe it to be
 a comprehensive list of retired bots with 6,000 or more edits), and perhaps
 the individual with a very high edit count who has in the past been blocked
 for running unauthorised bots on his user account. (I won't name that
 account on list, but since it also contains a large number of manual edits,
 the true answer is that you can't get an exact divide between bots and non
 bots by classifying every account as either a bot or a human).

 If you are minded to treat all accounts containing the word syllable bot
 as bots, then you might want to tweak that to count anyone on 
 thesehttps://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits
  two 
 listshttps://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits/5001%E2%80%931as
  human even if their name includes bot. I check those lists occasionally
 and make sure that the only bots included are human editors.


 On 18 May 2014 20:33, R.Stuart Geiger sgei...@gmail.com wrote:

 Tsk tsk tsk, Brian. When the revolution comes, bot discriminators will
 get no mercy. :-)

 But seriously, my tl;dr: instead of asking if an account is or isn't a
 bot, ask if a set of edits are or are not automated

 Great responses so far: searching usernames for *bot will exclude non-bot
 users who were registered before the username policy change (although *Bot
 is a bit better), and the logging table is a great way to collect bot
 flags. However, Scott is right -- the bot flag (or *Bot username) doesn't
 signify a bot, it signifies a bureaucrat recognizing that a user account
 successfully went through the Bot Approval Group process. If I see an
 account with a bot flag, I can generally assume the edits that account
 makes are initiated by an automated software agent. This is especially the
 case in the main namespace. The inverse assumption is not nearly as easy: I
 can't assume that every edit made from an account *without* a bot flag was
 *not* an automated edit.

 About unauthorized bots: yes, there are a relatively small number of
 Wikipedians who, on occasion, run fully-automated, continuously-operating
 bots without approval. Complicating this, if someone is going to take
 the time to build and run a bot, but isn't going to create a separate
 account for it, then it is likely that they are also using that account to
 do non-automated edits. Sometimes new bot developers will run an
 unauthorized bot under their own account during the initial stages of
 development, and only later in the process will they create a separate bot
 account and seek formal approval and flagging. It can get tricky when you
 exclude all the edits from an account for being automated based on a single
 suspicious set of edits.

 More commonly, there are many more people who use automated batch tools
 like AutoWikiBrowser to support one-off tasks, like mass find-and-replace
 or category cleanup. Accounts powered by AWB are technically not bots,
 only because a human has to sit there and click save for every batch edit
 that is made. Some people will create a separate bot account for AWB
 work and get it approved and flagged, but many more will not bother. Then
 there are people using semi-automated, human-in-the-loop tools like Huggle
 to do vandal fighting. I find that the really hard question is whether
 you include or exclude these different kinds of 'cyborgs', because it
 really makes you think hard about what exactly you're measuring. Is
 someone who does a mass find-and-replace on all articles in a category a
 co-author of each article they edit? Is a vandal fighter patrolling the
 recent changes feed with Huggle a co-author of all the articles they edit
 when they revert vandalism and then move on to the next diff? What about
 somebody using rollback in the web browser? If so, what is it that makes
 these entities authors and ClueBot NG not an author?

 When you think about it, user accounts are actually pretty remarkable in
 that they allow such a diverse set of uses and agents to be attributed to a
 single 

Re: [Wiki-research-l] Kill the bots

2014-05-19 Thread Federico Leva (Nemo)

Brian Keegan, 18/05/2014 18:10:

Is there a way to retrieve a canonical list of bots on enwiki or elsewhere?


A Bots.csv list exists. https://meta.wikimedia.org/wiki/Wikistat_csv
In general: please edit 
https://meta.wikimedia.org/wiki/Research:Identifying_bot_accounts


Nemo

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Kill the bots

2014-05-19 Thread Brian Keegan
Thanks for all the references and excellent advice so far!

I've looked into the Hale Anti-Bot Method™, but because I've sampled my
corpus on articles (based on category co-membership), the resulting groupby
users gives these semi-automated users more normal distributions since
their other contributions are censored. In other words, I see only a
fraction of these users' contributions and thus the resulting time
intervals I observe are spaced farther apart (more typical) than they
actually are. It's not feasible for me to get 100k+ users' histories just
for the purposes of cleaning up ~6k articles' histories.

Another thought I had was that because many semi-automated tools such as
Twinkle and AWB leave parenthetical annotations in their revision comments,
would this be a relatively inexpensive way to filter out revisions rather
than users? Some caveats, I'd like to get domain experts' feedback on. I'm
not expecting settled research, just input from others' experiences munging
the data.

1. Is the inclusion of this markup in revision comments optional? This is a
concern that some users may enable or disable it, so I may end up biasing
inclusion based on users' preferences.
2. How have these flags or markup changed over time? This is a concern that
Twinke/AWB/etc. may have started/stopped including flags or changed what
they included over time.
3. Are there other API queries or data elsewhere I could use to identify
(semi-)automated revisions?


On Mon, May 19, 2014 at 10:35 AM, Federico Leva (Nemo)
nemow...@gmail.comwrote:

 Brian Keegan, 18/05/2014 18:10:

  Is there a way to retrieve a canonical list of bots on enwiki or
 elsewhere?


 A Bots.csv list exists. https://meta.wikimedia.org/wiki/Wikistat_csv
 In general: please edit https://meta.wikimedia.org/
 wiki/Research:Identifying_bot_accounts

 Nemo


 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




-- 
Brian C. Keegan, Ph.D.
Post-Doctoral Research Fellow, Lazer Lab
College of Social Sciences and Humanities, Northeastern University
Fellow, Institute for Quantitative Social Sciences, Harvard University
Affiliate, Berkman Center for Internet  Society, Harvard Law School

b.kee...@neu.edu
www.brianckeegan.com
M: 617.803.6971
O: 617.373.7200
Skype: bckeegan
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Kill the bots

2014-05-19 Thread Ann Samoilenko

 the Hale Anti-Bot Method™

That's a good one.  =)

I'm a big fan of Scott's method

I second that. Again, great paper, Scott!


On Mon, May 19, 2014 at 5:31 PM, Aaron Halfaker aaron.halfa...@gmail.comwrote:

 Another thought I had was that because many semi-automated tools such as
 Twinkle and AWB leave parenthetical annotations in their revision comments


 See Stuarts comments above.  And also the queries he linked too.
 https://wiki.toolserver.org/view/MySQL_queries#Automated_tool_and_bot_edits 
 It would be nice if we could get these queries in version control and
 share them.

 Maybe there is potential for building a hand-curated list of bot user_ids
 in version control as well.

 -Aaron


 On Mon, May 19, 2014 at 10:17 AM, Brian Keegan b.kee...@neu.edu wrote:

 Thanks for all the references and excellent advice so far!

 I've looked into the Hale Anti-Bot Method™, but because I've sampled my
 corpus on articles (based on category co-membership), the resulting groupby
 users gives these semi-automated users more normal distributions since
 their other contributions are censored. In other words, I see only a
 fraction of these users' contributions and thus the resulting time
 intervals I observe are spaced farther apart (more typical) than they
 actually are. It's not feasible for me to get 100k+ users' histories just
 for the purposes of cleaning up ~6k articles' histories.

 Another thought I had was that because many semi-automated tools such as
 Twinkle and AWB leave parenthetical annotations in their revision comments,
 would this be a relatively inexpensive way to filter out revisions rather
 than users? Some caveats, I'd like to get domain experts' feedback on. I'm
 not expecting settled research, just input from others' experiences munging
 the data.

 1. Is the inclusion of this markup in revision comments optional? This is
 a concern that some users may enable or disable it, so I may end up biasing
 inclusion based on users' preferences.
 2. How have these flags or markup changed over time? This is a concern
 that Twinke/AWB/etc. may have started/stopped including flags or changed
 what they included over time.
 3. Are there other API queries or data elsewhere I could use to identify
 (semi-)automated revisions?


 On Mon, May 19, 2014 at 10:35 AM, Federico Leva (Nemo) 
 nemow...@gmail.com wrote:

 Brian Keegan, 18/05/2014 18:10:

  Is there a way to retrieve a canonical list of bots on enwiki or
 elsewhere?


 A Bots.csv list exists. https://meta.wikimedia.org/wiki/Wikistat_csv
 In general: please edit https://meta.wikimedia.org/
 wiki/Research:Identifying_bot_accounts

 Nemo


 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




 --
 Brian C. Keegan, Ph.D.
 Post-Doctoral Research Fellow, Lazer Lab
 College of Social Sciences and Humanities, Northeastern University
 Fellow, Institute for Quantitative Social Sciences, Harvard University
 Affiliate, Berkman Center for Internet  Society, Harvard Law School

 b.kee...@neu.edu
 www.brianckeegan.com
 M: 617.803.6971
 O: 617.373.7200
 Skype: bckeegan

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




-- 
-
Kind regards,
Ann Samoilenko, MSc

Oxford Internet Institute
University of Oxford

Adventures can change your life

e-mail: ann.samoile...@gmail.com
Skype: ann.samoilenko
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Kill the bots

2014-05-19 Thread Scott Hale
Thanks all for the comments on my paper, and even more thanks to everyone
sharing these super helpful ideas on filtering bots: this is why I love the
Wikipedia research committee.

I think Oliver is definitely right that

  this would be a useful topic for some piece of method-comparing research,
 if anyone is looking for paper ideas.

Citation goldmine as one friend called it, I think.

This won't address edit logs to date, but do  we know if most bots and
automated tools use the API to make edits? If so, would it be feasibility
to add a flag to each edit as to whether it came through the API or not.
This won't stop determined users, but might be a nice way to identify
cyborg edits from those made manually by the same user for many of the
standard tools going forward.

The closest thing I found in the bug tracker is [1], but it doesn't address
the issue of 'what is a bot' which this thread has clearly shown is quite
complex. An API-edit vs. non-API edit might be a way forward unless there
are automated tools/bots that don't use the API.


1. https://bugzilla.wikimedia.org/show_bug.cgi?id=11181


Cheers,
Scott
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l