Re: [Wiki-research-l] Upcoming research newsletter: new papers open for review

2014-02-25 Thread Klein,Max
Great idea Heather,

I will add my name to my review. Do you know any other review sites that 
aggregate in a wiki way that we could emulate?

Maximilian Klein
Wikipedian in Residence, OCLC
+17074787023


From: wiki-research-l-boun...@lists.wikimedia.org 
wiki-research-l-boun...@lists.wikimedia.org on behalf of Tilman Bayer 
tba...@wikimedia.org
Sent: Tuesday, February 25, 2014 9:16 AM
To: Research into Wikimedia content and communities
Cc: A mailing list for the Analytics Team at WMF and everybody who has an   
interest in Wikipedia and analytics.
Subject: Re: [Wiki-research-l] Upcoming research newsletter: new papers open 
for review

Hi Heather,

that's a cool idea, and we have actually been considering something
like this already. While the names of the reviewers are prominently
displayed in the byline on top (and also, many readers of the Signpost
and the newsletter are of course experienced in reading version
histories), showing them next to each review might be make attribution
easier. We just haven't found the time to implement it yet, like with
many other things for the newsletter. You are welcome to figure out a
suitable format and add these attributions in the upcoming issue,
let's follow up offlist if more information is needed.

On Tue, Feb 25, 2014 at 1:11 AM, Heather Ford hfor...@gmail.com wrote:
 Thanks, Dario, Tilman!

 I was wondering whether it would be helpful to add reviewer names/usernames
 to individual signpost reviews. I was struck while reading a review of a
 paper on Signpost recently that I felt like the reviewer was inserting some
 very opinionated statements about the article rather than the regular
 summaries. While I don't think that this is a problem necessarily (although
 I wish that they were a bit more informed about the topic and social science
 research in general), I do think it can be problematic to have these
 comments unattributed. Would be interested to hear what others think...

 Best,
 Heather.

 Heather Ford
 Oxford Internet Institute Doctoral Programme
 EthnographyMatters | Oxford Digital Ethnography Group
 http://hblog.org | @hfordsa




 On 25 February 2014 05:26, Tilman Bayer tba...@wikimedia.org wrote:

 Hi Max,

 yes, we're co-publishing with the Signpost, so the ultimate deadline
 is the Signpost's actual publication time. Its formal publication date
 is this Wednesday (the 26th) UTC, although actual publication might
 take place several hours or even a few days later. Thanks for signing
 up to review the Editor's Biases paper, I'm looking forward to
 reading your summary!

 On Mon, Feb 24, 2014 at 3:39 PM, Klein,Max kle...@oclc.org wrote:
  Dario, what's the timeframe for writing reviews so they can get into the
  signpost in time. 25th?
 
  Maximilian Klein
  Wikipedian in Residence, OCLC
  +17074787023
 
  
  From: wiki-research-l-boun...@lists.wikimedia.org
  wiki-research-l-boun...@lists.wikimedia.org on behalf of Dario 
  Taraborelli
  dtarabore...@wikimedia.org
  Sent: Monday, February 24, 2014 8:11 AM
  To: A mailing list for the Analytics Team at WMF and everybody who has
  an   interest in Wikipedia and analytics.; Research into Wikimedia
  content and communities
  Subject: [Wiki-research-l] Upcoming research newsletter: new papers open
  forreview
 
  Hi everybody,
 
  with CSCW just concluded and conferences like CHI and WWW coming up we
  have a good set of papers to review for the February issue of the Research
  Newsletter [1]
 
  Please take a look at: https://etherpad.wikimedia.org/p/WRN201402 and
  add your name next to any paper you are interested in reviewing. As usual,
  short notes and one-paragraph reviews are most welcome.
 
  Instead of contacting past contributors only, this month we're
  experimenting with a public call for reviews cross-posted to analytics-l 
  and
  wiki-research-l. if you have any question about the format or process feel
  free to get in touch off-list.
 
  Dario Taraborelli and Tilman Bayer
 
  [1] http://meta.wikimedia.org/wiki/Research:Newsletter
  ___
  Wiki-research-l mailing list
  Wiki-research-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 
  ___
  Wiki-research-l mailing list
  Wiki-research-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



 --
 Tilman Bayer
 Senior Operations Analyst (Movement Communications)
 Wikimedia Foundation
 IRC (Freenode): HaeB

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




--
Tilman Bayer
Senior

Re: [Wiki-research-l] Upcoming research newsletter: new papers open for review

2014-02-24 Thread Klein,Max
Dario, what's the timeframe for writing reviews so they can get into the 
signpost in time. 25th?

Maximilian Klein
Wikipedian in Residence, OCLC
+17074787023


From: wiki-research-l-boun...@lists.wikimedia.org 
wiki-research-l-boun...@lists.wikimedia.org on behalf of Dario Taraborelli 
dtarabore...@wikimedia.org
Sent: Monday, February 24, 2014 8:11 AM
To: A mailing list for the Analytics Team at WMF and everybody who has an   
interest in Wikipedia and analytics.; Research into Wikimedia content and 
communities
Subject: [Wiki-research-l] Upcoming research newsletter: new papers open for
review

Hi everybody,

with CSCW just concluded and conferences like CHI and WWW coming up we have a 
good set of papers to review for the February issue of the Research Newsletter 
[1]

Please take a look at: https://etherpad.wikimedia.org/p/WRN201402 and add your 
name next to any paper you are interested in reviewing. As usual, short notes 
and one-paragraph reviews are most welcome.

Instead of contacting past contributors only, this month we’re experimenting 
with a public call for reviews cross-posted to analytics-l and wiki-research-l. 
if you have any question about the format or process feel free to get in touch 
off-list.

Dario Taraborelli and Tilman Bayer

[1] http://meta.wikimedia.org/wiki/Research:Newsletter
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Preexsiting Researchers on Metrics for Users?

2014-02-07 Thread Klein,Max
Thanks Nemo, I'll re-read that discussion. I think that conversation is where I 
became tentative of using bytes or edit counts.

Aaron, in my own search I also noticed you wrote with Geiger. About counting 
edit hour and edit sessions. [1]  Calculating content persistence is a bit too 
heavyweight for me right now since I am trying to submit to ACM Web Science in 
2 weeks (hose CFP was just on this list). The technique looks great though, and 
I would like to help support making a WMFlabs tool that can return this measure.

It seems like I could calculate approximate edit-hours from just looking at 
Special:Contributions timestamps. Is that correct? Would you suggest this route?


[1] 
http://www-users.cs.umn.edu/~halfak/publications/Using_Edit_Sessions_to_Measure_Participation_in_Wikipedia/geiger13using-preprint.pdf



Maximilian Klein
Wikipedian in Residence, OCLC
+17074787023



From: wiki-research-l-boun...@lists.wikimedia.org 
wiki-research-l-boun...@lists.wikimedia.org on behalf of Aaron Halfaker 
aaron.halfa...@gmail.com
Sent: Friday, February 07, 2014 7:12 AM
To: Research into Wikimedia content and communities
Subject: Re: [Wiki-research-l] Preexsiting Researchers on Metrics for Users?

Hey Max,

There's a class of metrics that might be relevant to your purposes.  I refer to 
them as content persistence metrics and wrote up some docs about how they 
work including an example.  See 
https://meta.wikimedia.org/wiki/Research:Content_persistence.

I gathered a list of papers below to provide a starting point.  I've included 
links to open access versions where I could.  These metrics are a little bit 
painful to compute due to the computational complexity of diffs, but I have 
some hardware to throw at the problem and another project that's bringing me in 
this direction, so I'd be interested in collaborating.

Priedhorsky, Reid, et al. Creating, destroying, and restoring value in 
Wikipedia. Proceedings of the 2007 international ACM conference on Supporting 
group work. ACM, 2007. http://reidster.net/pubs/group282-priedhorsky.pdf:

  *   Describes Persistent word views which is a measure of value added per 
editor.  (IMO, value actualized)

B. Thomas Adler, Krishnendu Chatterjee, Luca de Alfaro, Marco Faella, Ian Pye, 
and Vishwanath Raman. 2008. Assigning trust to Wikipedia content. In 
Proceedings of the 4th International Symposium on Wikis (WikiSym '08). ACM, New 
York, NY, USA, , Article 26 , 12 pages. 
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.141.2047rep=rep1type=pdf

  *   Describes a complex strategy for assigning trustworthiness to content 
based on implicit review.  See http://wikitrust.soe.ucsc.edu/

Halfaker, A., Kittur, A., Kraut, R.,  Riedl, J. (2009, October). A jury of 
your peers: quality, experience and ownership in Wikipedia. In Proceedings of 
the 5th International Symposium on Wikis and Open Collaboration (p. 15). ACM. 
http://www-users.cs.umn.edu/~halfak/publications/A_Jury_of_Your_Peers/halfaker09jury-personal.pdf

  *   Describes the use of Persistent word revisions per word as a measure of 
article contribution quality.

Halfaker, A., Kittur, A.,  Riedl, J. (2011, October). Don't bite the newbies: 
how reverts affect the quantity and quality of Wikipedia work. In Proceedings 
of the 7th International Symposium on Wikis and Open Collaboration (pp. 
163-172). ACM. 
http://www-users.cs.umn.edu/~halfak/publications/Don't_Bite_the_Newbies/halfaker11bite-personal.pdf

  *   Describes the use of raw Persistent work revisions as a measure of 
editor productivity
  *   Looking back on the study, I think I'd rather use log(# of revisions a 
word persists) * words.

-Aaron


On Fri, Feb 7, 2014 at 1:48 AM, Federico Leva (Nemo) 
nemow...@gmail.commailto:nemow...@gmail.com wrote:
Sort of related, an ongoing education@ discussion student evaluation 
criteria. http://thread.gmane.org/gmane.org.wikimedia.education/854

Nemo

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.orgmailto:Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Preexsiting Researchers on Metrics for Users?

2014-02-06 Thread Klein,Max
Hello All,


Can you point me to research, or have ideas about metrics of user performance? 
I know edit count, and total bytes have their limitations. Right now I am 
counting the occurrences of thank appreciate and barnstar  for a User in 
User talk namespace (and recursive subpages). What else is there?



Let me explain more about my current project

I am trying to develop some new techniques to measure user and article 
performance. I am repurposing the bi-partite economics trade model of 
countries-products, but instead using editors-articles. This means that I 
arrive at a new metric for users, and articles. Now I am calibrating some of 
the variables in this model, by comparing my results to exogenous variable. On 
pages, I use the metric that this listed pointed me to last time, like the 
actionable metrics from Group lens, and cleanup tags from Stein. (Thank list!). 
When I rank articles in a category using my economics method, versus the 
article-text methods I acheive .7 spearman correlation. Using my 
count-thanks-on-user-talk method for users in the user domain I acheive .50 
spearman ranking correlation, which is still quite good, but I want to make 
sure there aren't better baselines to which to compare.



Thanks,

Maximilian Klein
Wikipedian in Residence, OCLC
+17074787023
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Polling the watcher's of a page. Possible?

2013-12-30 Thread Klein,Max
Hello Research,

It it possible to query for the watchers of a page? It does not seem to be in 
the API, nor is the watchers or wl_user table in the Data Base replicas 
(where I thought MediaWiki stores it. I imagine this is for privacy reasons, 
correct? If so, how would one gain access?

I have been talking with an econophysicist who thinks that we could apply a 
contagion algorithm, to see which edits are contagious.  (I met this 
econopyhicist at the Berkeley Data Science Faire at which Wikimedia Analytics 
presented, so it was worth it in the end).


Maximilian Klein
Wikipedian in Residence, OCLC
+17074787023
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] How to collect all the admin-specific edits for a subset of Wp admins

2013-10-10 Thread Klein,Max
Hello Jerome,

I'm not sure this is the best way, but pywikipediabot [1] has a library called 
pagegenerators.py and there is a function def 
UserContributionsGenerator(username) (around line 706). That would allow you to 
iterate through theses user names, and I bet there will be a special marking 
for deletions/undeletions. If not, worst comes to worse you can use a regular 
expression for those words.

[1] https://meta.wikimedia.org/wiki/pywikipediabot

When you use have a pywikibot-hammer everything looks like a pywikibot-nail!

Maximilian Klein
Wikipedian in Residence, OCLC
+17074787023


From: wiki-research-l-boun...@lists.wikimedia.org 
wiki-research-l-boun...@lists.wikimedia.org on behalf of J?r?me Hergueux 
jerome.hergu...@gmail.com
Sent: Thursday, October 10, 2013 3:11 AM
To: wiki-research-l@lists.wikimedia.org
Subject: [Wiki-research-l] How to collect all the admin-specific edits for a 
subset of Wp admins

Dear all,

I am starting this thread in the hope that some of the great Wiki researchers 
on this list could advise me on a data collection problem.

Here is the question: for a each of 120 Wikipedia admins (for whom I have the 
usernames and unique numeric ids), I would like to reliably count the number of 
times they (i) deleted a page (ii) undeleted (i.e. restored) a page (iii) 
protected a page (iv) blocked a user and (v) unblocked a user.
Those types of edits all correspond to a specific action in the Wikipedia API 
documentation page (http://en.wikipedia.org/w/api.php): action=delete, 
action=undelete, action=protect, action=block and action=unblock.
I don't know, however, what would be the best strategy to go about collecting 
those edits. Does anyone have an idea about which data collection strategy I 
should adopt in this case? Is there a way to query the Wikipedia API directly, 
or should I look for some specific markers in the edit summaries?

I would be very grateful for any advice of feedback!
Thanks much for your attention and time. :)

Best,

J?r?me.
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] diffdb formatted Wikipedia dump

2013-10-08 Thread Klein,Max
Susan,

Hmm, seems like that is a funny middle ground, where it's too long to get live 
- although its probably less than 158 days. I once read an edited 400,000 pages 
with pywikibot (3 network IO calls per page, read, external API, write) in 
about 20 days. You would have to make two IO calls (read, getHistory), per 
userpage. I don't know how many userpages there are, but that might be enough 
variables to satisfy a system of inequalities that you need.

If you are deadset on using hadoop, maybe you could use the Wikimedia Labs  
XGrid https://wikitech.wikimedia.org/wiki/Main_Page.
They have some monster power and it's free for bot operators and other tool 
runners. Maybe it's also worth asking on there if someone already has 
wikihadoop set up.


Maximilian Klein
Wikipedian in Residence, OCLC
+17074787023


From: wiki-research-l-boun...@lists.wikimedia.org 
wiki-research-l-boun...@lists.wikimedia.org on behalf of Susan Biancani 
inacn...@gmail.com
Sent: Tuesday, October 08, 2013 3:28 PM
To: Research into Wikimedia content and communities
Subject: Re: [Wiki-research-l] diffdb formatted Wikipedia dump

Right now, I want all the edits to user pages and user talk pages, 2010-2013. 
But as I keep going with this project, I may want to expand a bit, so I figured 
if I was going to run the wikihadoop software, I might as well only do it once.

I'm hesitant to do this via web scraping, because I think it'll take much 
longer than working with the dump files. However, if you have suggestions on 
how to get the diffs (or a similar format) efficiently from the dump files, I 
would definitely love to hear them.

I appreciate the help and advice!


On Mon, Oct 7, 2013 at 10:44 AM, Pierre-Carl Langlais 
pierrecarl.langl...@gmail.commailto:pierrecarl.langl...@gmail.com wrote:
I agree with Klein. If you do not need to exploit the entire Wikipedia 
database, requests through a python scraping library (like Beautiful Soup) are 
certainly sufficient and easy to set up. With an aleatory algorithm to select 
the ids you can create a fine sample.
PCL

Le 07/10/13 19:31, Klein,Max a écrit :
Hi Susan,

Do you need the entire database diff'd? I.e. all edits ever. Or are you 
interested in a particular subset of the diffs? It would help to know your 
purpose.

For instance I am interested in diffs around specific articles for specific 
dates to study news events. So I calculate the diffs myself using python on 
page histories rather than the entire database.

Maximilian Klein
Wikipedian in Residence, OCLC
+17074787023tel:%2B17074787023


From: 
wiki-research-l-boun...@lists.wikimedia.orgmailto:wiki-research-l-boun...@lists.wikimedia.org
 
wiki-research-l-boun...@lists.wikimedia.orgmailto:wiki-research-l-boun...@lists.wikimedia.org
 on behalf of Susan Biancani inacn...@gmail.commailto:inacn...@gmail.com
Sent: Thursday, October 03, 2013 10:06 PM
To: 
wiki-research-l@lists.wikimedia.orgmailto:wiki-research-l@lists.wikimedia.org
Subject: [Wiki-research-l] diffdb formatted Wikipedia dump

I'm looking for a dump from English Wikipedia in diff format (i.e. each entry 
is the text that was added/deleted since the last edit, rather than each entry 
is the current state of the page).

The Summer of Research folks provided a handy guide to how to create such a 
dataset from the standard complete dumps here: 
http://meta.wikimedia.org/wiki/WSoR_datasets/revision_diff
But the time estimate they give is prohibitive for me (20-24 hours for each 
dump file--there are currently 158--running on 24 cores). I'm a grad student in 
a social science department, and don't have access to extensive computing 
power. I've been paying out of pocket for AWS, but this would get expensive.

There is a diff-format dataset available, but only through April, 2011 (here: 
http://dumps.wikimedia.org/other/diffdb/). I'd like to get a diff-format 
dataset for January, 2010- March, 2013 (or, for everything up to March, 2013).

Does anyone know if such a dataset exists somewhere? Any leads or suggestions 
would be much appreciated!

Susan



___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.orgmailto:Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.orgmailto:Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l