[Wiki-research-l] Leaving the Wikimedia Foundation, staying on the wikis

2019-02-13 Thread Dario Taraborelli
Hey all,

I've got some personal news to share.

After 8 years with Wikimedia, I have decided to leave the Foundation to
take up a new role focused on open science. This has been a difficult
decision but an opportunity arose and I am excited to be moving on to an
area that’s been so close to my heart for years.

Serving the movement as part of the Research team at WMF has been, and will
definitely be, the most important gig in my life. I leave a team of
ridiculously talented and fun people that I can’t possibly imagine not
spending all of my days with, as well many collaborators and friends in the
community who have I worked alongside. I am proud and thankful to have been
part of this journey with you all. With my departure, Leila Zia is taking
the lead of Research at WMF, and you all couldn't be in better hands.

In March, I’ll be joining CZI Science—a philanthropy based in the Bay
Area—to help build their portfolio of open science programs and technology.
I'll continue to be an ally on the same fights in my new role.

Other than that, I look forward to returning to full volunteer mode. I
started editing English Wikipedia in 2004, working on bloody chapters in
the history of London
<https://en.wikipedia.org/wiki/Smithfield,_London>; hypothetical
astronomy <https://en.wikipedia.org/wiki/Planet_Nine>; unsung heroes among
women in science <https://en.wikipedia.org/wiki/Susan_Potter>; and of
course natural <https://en.wikipedia.org/wiki/2014_South_Napa_earthquake>,
technical <https://en.wikipedia.org/wiki/October_2016_Dyn_cyberattack>
and political
disasters
<https://en.wikipedia.org/wiki/Russian_interference_in_the_2016_United_States_elections>.
I’ve also developed an embarrassing addiction to Wikidata, and you’ll
continue seeing me around hacking those instances of Q16521
<https://www.wikidata.org/wiki/Q16521> for a little while.

I hope our paths cross once again in the future.

Best,

Dario


-- 

*Dario Taraborelli  *Director, Head of Research, Wikimedia Foundation
research.wikimedia.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Farewell, Erik!

2019-02-06 Thread Dario Taraborelli
“[R]ecent revisions of an article can be peeled off to reveal older layers,
which are still meaningful for historians. Even graffiti applied by vandals
can by its sheer informality convey meaningful information, just like
historians learned a lot from graffiti on walls of classic Pompei. Likewise
view patterns can tell future historians a lot about what was hot and what
wasn’t in our times. Reason why these raw view data are meant to be
preserved for a long time.”

Erik Zachte wrote these lines in a blog post
<https://web.archive.org/web/20171018194720/http://infodisiac.com/blog/2009/07/michael-jackson/>
almost
ten years ago, and I cannot find better words to describe the gift he gave
us. Erik retired <http://infodisiac.com/back_to_volunteer_mode.htm> this
past Friday, leaving behind an immense legacy. I had the honor to work with
him for several years, and I hosted this morning an intimate, tearful
celebration of what Erik has represented for the Wikimedia movement.

His Wikistats project <https://stats.wikimedia.org/>—with his signature
pale yellow background we've known and loved since the mid 2000s
<https://web.archive.org/web/20060412043240/https://stats.wikimedia.org/>—has
been much more than an "analytics platform". It's been an individual
attempt he initiated, and grew over time, to try and comprehend and make
sense of the largest open collaboration project in human history, driven by
curiosity and by an insatiable desire to serve data to the communities that
most needed it.

Through this project, Erik has created a live record of data describing the
growth and reach of all Wikimedia communities, across languages and
projects, putting multi-lingualism and smaller communities at the very
center of his attention. He coined metrics such as "active editors" that
defined the benchmark for volunteers, the Wikimedia Foundation, and the
academic community to understand some of the growing pains and editor
retention issues
<https://web.archive.org/web/20110608214507/http://infodisiac.com/blog/2009/12/new-editors-are-joining-english-wikipedia-in-droves/>
the movement has faced. He created countless reports—that predate by nearly
a decade modern visualizations of online attention—to understand what
Wikipedia traffic means in the context of current events like elections
<https://web.archive.org/web/20160405055621/http://infodisiac.com/blog/2008/09/sarah-palin/>
or public health crises
<https://web.archive.org/web/20090708011216/http://infodisiac.com/blog/2009/05/h1n1-flu-or-new-flu-or/>.
He has created countless
<https://twitter.com/Infodisiac/status/1039244151953543169> visualizations
<https://blog.wikimedia.org/2017/10/27/new-interactive-visualization-wikipedia/>
that show the enormous gaps in local language content and representation
that, as a movement, we face in our efforts to build an encyclopedia for
and about everyone. He has also made extensive use of pie charts
<https://web.archive.org/web/20141222073751/http://infodisiac.com/blog/wp-content/uploads/2008/10/piechartscorrected.png>,
which—as friends—we are ready to turn a blind eye towards.

Most importantly, the data Erik has brougth to life has been cited over
1,000 times
<https://scholar.google.com/scholar?hl=en_sdt=0%2C5=stats.wikimedia.org>
in the scholarly literature. If we gave credit to open data creators in the
same way as we credit authors of scholarly papers, Erik would be one of the
most influential authors in the field, and I don't think it is much of a
stretch to say that the massive trove of data and metrics Erik has made
available had a direct causal role in the birth and growth of the academic
field of Wikimedia research, and more broadly, scholarship of online
collaboration.

Like I said this morning, Erik -- you have been not only an invaluable
colleague and a steward for the movement, but also a very decent human
being, and I am grateful we shared some of this journey together.

Please join me in celebrating Erik on his well-deserved retirement, read
his statement <http://infodisiac.com/back_to_volunteer_mode.htm> to learn
what he's planning to do next, or check this lovely portrait
<https://www.wired.com/2013/12/erik-zachte-wikistats/> Wired published a
while back about "the Stats Master Making Sense of Wikipedia's Massive Data
Trove".

Dario


-- 
*Dario Taraborelli  *Director, Head of Research, Wikimedia Foundation
research.wikimedia.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Modeling interactions on talk pages and detecting early signs of conversational failure: Research Showcase - June 18, 2018 (11:30 AM PDT| 18:30 UTC)

2018-06-18 Thread Dario Taraborelli
Hey all,

a reminder that the livestream of our monthly research showcase will start
in about 2 hours (11:30 PT / 18:30 UTC) with our collaborators from Jigsaw
and Cornell as guest speakers. You can follow the stream on YouTube:
https://www.youtube.com/watch?v=m4vzI0k4OSg and join the live Q on IRC in
the #wikimedia-research channel.

Looking forward to seeing you there!

Dario


On Thu, May 31, 2018 at 5:07 PM Dario Taraborelli <
dtarabore...@wikimedia.org> wrote:

> Hey everyone,
>
> we're hosting a dedicated session in June on our joint work with Cornell
> and Jigsaw on predicting conversational failure
> <https://arxiv.org/abs/1805.05345> on Wikipedia talk pages. This is part
> of our contribution to WMF's Anti-Harassment program.
>
> The showcase
> <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#June_2018> will be
> live-streamed <https://www.youtube.com/watch?v=m4vzI0k4OSg> on *Monday,
> June 18, 2018* at 11:30 AM (PDT), 18:30 (UTC).  (Please note this falls
> on a Monday this month).
>
> Conversations Gone Awry. Detecting Early Signs of Conversational FailureBy
>  *Justine Zhang and Jonathan Chang, Cornell University*One of the main
> challenges online social systems face is the prevalence of antisocial
> behavior, such as harassment and personal attacks. In this work, we
> introduce the task of predicting from the very start of a conversation
> whether it will get out of hand. As opposed to detecting undesirable
> behavior after the fact, this task aims to enable early, actionable
> prediction at a time when the conversation might still be salvaged. To this
> end, we develop a framework for capturing pragmatic devices—such as
> politeness strategies and rhetorical prompts—used to start a conversation,
> and analyze their relation to its future trajectory. Applying this
> framework in a controlled setting, we demonstrate the feasibility of
> detecting early warning signs of antisocial behavior in online discussions.
>
>
> Building a rich conversation corpus from Wikipedia Talk pagesWe present a
> corpus of conversations that encompasses the complete history of
> interactions between contributors to English Wikipedia's Talk Pages. This
> captures a new view of these interactions by containing not only the final
> form of each conversation but also detailed information on all the actions
> that led to it: new comments, as well as modifications, deletions and
> restorations. This level of detail supports new research questions
> pertaining to the process (and challenges) of large-scale online
> collaboration. As an example, we present a small study of removed comments
> highlighting that contributors successfully take action on more toxic
> behavior than was previously estimated.
>
> YouTube stream:  https://www.youtube.com/watch?v=m4vzI0k4OSg
>
> As usual, you can join the conversation on IRC at #wikimedia-research.
> And, you can watch our past research showcases here
> <https://www.youtube.com/playlist?list=PLhV3K_DS5YfLQLgwU3oDFiGaU3K7pUVoW>
> .
>
> Hope to see you there on June 18!
> Dario
>


-- 

*Dario Taraborelli  *Director, Head of Research, Wikimedia Foundation
research.wikimedia.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Fwd: Modeling interactions on talk pages and detecting early signs of conversational failure: Research Showcase - June 18, 2018 (11:30 AM PDT| 18:30 UTC)

2018-05-31 Thread Dario Taraborelli
Hey everyone,

we're hosting a dedicated session in June on our joint work with Cornell
and Jigsaw on predicting conversational failure
 on Wikipedia talk pages. This is part of
our contribution to WMF's Anti-Harassment program.

The showcase
 will be
live-streamed  on *Monday,
June 18, 2018* at 11:30 AM (PDT), 18:30 (UTC).  (Please note this falls on
a Monday this month).

Conversations Gone Awry. Detecting Early Signs of Conversational
FailureBy *Justine
Zhang and Jonathan Chang, Cornell University*One of the main challenges
online social systems face is the prevalence of antisocial behavior, such
as harassment and personal attacks. In this work, we introduce the task of
predicting from the very start of a conversation whether it will get out of
hand. As opposed to detecting undesirable behavior after the fact, this
task aims to enable early, actionable prediction at a time when the
conversation might still be salvaged. To this end, we develop a framework
for capturing pragmatic devices—such as politeness strategies and
rhetorical prompts—used to start a conversation, and analyze their relation
to its future trajectory. Applying this framework in a controlled setting,
we demonstrate the feasibility of detecting early warning signs of
antisocial behavior in online discussions.


Building a rich conversation corpus from Wikipedia Talk pagesWe present a
corpus of conversations that encompasses the complete history of
interactions between contributors to English Wikipedia's Talk Pages. This
captures a new view of these interactions by containing not only the final
form of each conversation but also detailed information on all the actions
that led to it: new comments, as well as modifications, deletions and
restorations. This level of detail supports new research questions
pertaining to the process (and challenges) of large-scale online
collaboration. As an example, we present a small study of removed comments
highlighting that contributors successfully take action on more toxic
behavior than was previously estimated.

YouTube stream:  https://www.youtube.com/watch?v=m4vzI0k4OSg

As usual, you can join the conversation on IRC at #wikimedia-research. And,
you can watch our past research showcases here
.

Hope to see you there on June 18!
Dario
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Fwd: [Analytics] Research Showcase May 8, 2018 (11:30 AM PDT| 18:30 UTC)

2018-05-08 Thread Dario Taraborelli
A reminder that this is starting in about 20 minutes from now. Tune in or
join us on IRC (#wikimedia-research) for a live Q

-- Forwarded message -
From: Sarah R <srodl...@wikimedia.org>
Date: Mon, May 7, 2018 at 6:14 AM
Subject: [Analytics] Research Showcase May 8, 2018 (11:30 AM PDT| 18:30 UTC)
To: <wikimedi...@lists.wikimedia.org>, <wiki-research-l@lists.wikimedia.org>,
<analyt...@lists.wikimedia.org>


Hi Everyone,

The next Research Showcase will be live-streamed this Tuesday, May 8,
2018 at 11:30 AM (PDT), 18:30 (UTC). (Please note this meeting is on
Tuesday this month).

YouTube stream: https://www.youtube.com/watch?v=t7cHxlGgEt4

As usual, you can join the conversation on IRC at #wikimedia-research. And,
you can watch our past research showcases here.
<https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#Upcoming_Showcase>

Case studies in the appropriation of ORESBy *Aaron Halfaker*ORES is an
open, transparent, and auditable machine prediction platform for
Wikipedians to help them do their work. It's currently used in 33 different
Wikimedia projects to measure the quality of content, detect vandalism,
recommend changes to articles, and to identify good faith newcomers. The
primary way that Wikipedians use ORES' predictions is through the tools
developed by volunteers. These javascript gadgets, MediaWiki extensions,
and web-based tools make up a complex ecosystem of Wikipedian processes --
encoded into software. In this presentation, Aaron will walk through a
three key tools that Wikipedians have developed that make use of ORES, and
he'll discuss how these novel process support technologies and the
discussions around them have prompted Wikipedians to reflect on their work
processes.


Exploring Wikimedia Donation PatternsBy *Gary Hsieh*Every year, Wikimedia
Foundation relies on fundraising campaigns to help maintain the services it
provides to millions of people worldwide. However, despite a large number
of individuals who donate through these campaigns, these donors represent
only a small percentage of Wikimedia users. In this work, we seek to
advance our understanding of donors and their donation behaviors. Our
findings offer insights to improve fundraising campaigns and to limit the
burden of these campaigns on Wikipedia visitors.

Kindly,

Sarah R. Rodlund
Senior Project Coordinator-Product & Technology, Wikimedia Foundation
___
Analytics mailing list
analyt...@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


-- 

*Dario Taraborelli  *Director, Head of Research, Wikimedia Foundation
research.wikimedia.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Knowledge Integrity: A proposed Wikimedia Foundation cross-departmental program for 2018-2019

2018-04-16 Thread Dario Taraborelli
Hey all,

(apologies for cross-posting)

We’re sharing a proposed program
<https://www.mediawiki.org/wiki/Wikimedia_Technology/Annual_Plans/FY2019/CDP3:_Knowledge_Integrity>
 for the Wikimedia Foundation’s upcoming fiscal year
<https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2018-2019/Draft>
 (2018-19) and *would love to hear from you*. This plan builds extensively
on projects and initiatives driven by volunteer contributors and
organizations in the Wikimedia movement, so your input is critical.

Why a “knowledge integrity” program?

Increased global attention is directed at the problem of misinformation and
how media consumers are struggling to distinguish fact from fiction.
Meanwhile, thanks to the sources they cite, Wikimedia projects are uniquely
positioned as a reliable gateway to accessing quality information in the
broader knowledge ecosystem. How can we mobilize these citations as a
resource and turn them into a broader, linked infrastructure of trust to
serve the entire internet?  Free knowledge grounds itself in verifiability
and transparent attribution policies. Let’s look at 4 data points as
motivating stories:

   - Wikipedia sends tens of millions of people to external sources each
   year. We want to conduct research to understand why and how readers leave
   our site.
   - The Internet Archive has fixed over 4 million dead links on Wikipedia.
   We want to enable instantaneous archiving of every link on all Wikipedias
   to ensure the long-term preservation of the sources Wikipedians cite.
   - #1Lib1Ref reaches 6 million people on social media. We want to bring
   #1Lib1Ref to Wikidata and more languages, spreading the message that
   references improve quality.
   - 33% of Wikidata items represent sources (journals, books, works). We
   want to strengthen community efforts to build a high-quality, collaborative
   database of all cited and citable sources.

A 5-year vision

Our 5-year vision for the Knowledge Integrity program is to establish
Wikimedia as the hub of a federated, trusted knowledge ecosystem. We plan
to get there by creating:

   - A roadmap to a mature, technically and socially scalable, central
   repository of sources.
   - Developed network of partners and technical collaborators to
   contribute to and reuse data about citations.
   - Increased public awareness of Wikimedia’s vital role in information
   literacy and fact-checking.


5 directions for 2018-2019

We have identified 5 levers of Knowledge Integrity: research,
infrastructure and tooling, access and preservation, outreach, and
awareness. Here’s what we want to do with each:


   1. Continue to conduct research to understand how readers access sources
   and how to help contributors improve citation quality.
   2. Improve tools for linking information to external sources, catalogs,
   and repositories.
   3. Ensure resources cited across Wikimedia projects are accessible in
   perpetuity.
   4. Grow outreach and partnerships to scale community and technical
   efforts to improve the structure and quality of citations.
   5. Increase public awareness of the processes Wikimedians follow to
   verify information and articulate a collective vision for a trustable web.


Who is involved?

The core teams involved in this proposal are:

   - Wikimedia Foundation Technology’s Research Team
   - Wikimedia Foundation Community Engagement’s Programs team (Wikipedia
   Library)
   - Wikimedia Deutschland Engineering’s Wikidata team


The initiative also spans across an ecosystem of possible partners
including the Internet Archive, ContentMine, Crossref, OCLC, OpenCitations,
and Zotero. It is further made possible by funders including the Sloan,
Gordon and Betty Moore, and Simons Foundations who have been supporting the
WikiCite initiative to date.

How you can participate

You can read the fine details of our proposed year-1 plan, and provide your
feedback, on mediawiki.org: https://www.mediawiki.org/
wiki/Wikimedia_Technology/Annual_Plans/FY2019/CDP3:_Knowledge_Integrity

We’ve also created a brief introductory slidedeck about our motivation and
goals: https://commons.wikimedia.org/wiki/File:Knowledge_Integrity_
CDP_proposal_%E2%80%93_FY2018-19.pdf

WikiCite has laid the groundwork for many of these efforts. Read last
year’s report: https://commons.wikimedia.org/wiki/File:WikiCite_2017_
report.pdf

Recent initiatives like the just released citation dataset foreshadow the
work we want to do: https://medium.com/freely-sharing-the-sum-of-all-
knowledge/what-are-the-ten-most-cited-sources-on-
wikipedia-lets-ask-the-data-34071478785a

Lastly, this April we’re celebrating Open Citations Month; it’s right in
the spirit of Knowledge Integrity: https://blog.wikimedia.org/
2018/04/02/initiative-for-open-citations-birthday/


-- 

*Dario Taraborelli  *Director, Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.

Re: [Wiki-research-l] [Analytics] A new landing page for the Wikimedia Research team

2018-02-11 Thread Dario Taraborelli
Hey all,

thanks for the great feedback. A couple of notes to expand on Jonathan's
response.

On Thu, Feb 8, 2018 at 8:46 AM, Jonathan Morgan <jmor...@wikimedia.org>
wrote:

> Aaron: I'll ask Baha about the issue tracking... *issue* today. The code
> is hosted on Gerrit now, with a one-way mirror on this GitHub repo[1],
> which is not ideal from an openness/collaboration POV. For me, enabling
> easy issue tracking and pull requests is the most pressing issue. In the
> meantime, you can submit tasks through Phab. Add them to the Research
> board[2] and/or as subtasks of our Landing Page creation epic[3]. Not
> ideal, but at least you can capture things this way.
>

this is far from optimal. Due to production requirements, all code needs to
be on Gerrit, but asking people who want to suggest typo fixed to go
through the developer access instructions is a usability nightmare.
Jonathan's suggestion is a temporary solution, I'd like to work with Baha
to figure out if there's a possible workflow that allows us to receive PRs
and issues on GitHub, have them synced with Gerrit, before they are
reviewed and, if +2'ed, merged there. This may take a while so we
appreciate your patience.


> Federico: Translation via translatewiki would be very cool. We haven't
> prioritized this because, well, none of our on-wiki research team pages
> were ever translated, and this microsite is intended to supplement our
> on-wiki content, not replace it. But it sounds like a potential 'roadmap'
> kinda deal and I'll make sure to track it.
>



Our assumption was that the place for volunteer communities to find
translated content is (and should be) on wiki, and we can tap all the
existing workflows for translation there as needed. The main audiences for
this landing page are (primarily English speaking) funding and research
organizations who don't know how to navigate content across 4+ wikis and a
number of external data / publication repositories. I support the idea of
translations, if we can make it work and if there's appetite for it, the
minimum viable content was intentionally conceived to be in English.

Iolanda: this is the landing page for the Wikimedia Foundation Research
> team[4], not for the international community of researchers who study
> Wiki[*]edia. It's also not the landing page for all researchers and
> research activities within the Wikimedia Foundation--just those of team
> members (and Aaron, whose Scoring Platform team is a kind of spin
> off/sibling of the research team).
>

As an additional clarification: the Research Index on Meta remains the
central hub of all research projects created by the volunteer community,
academic researchers, and Wikimedia Foundation staff. This landing page
acts as a filter, and a thin layer of discoverability, to the contributions
made by the Wikimedia Research team to the Research Index (as well as
additional documentation that may exist across other wikis). Hope that
makes sense.


> Thanks everyone for the feedback so far. Keep it coming,
>
> Jonathan
>
> 1. https://github.com/wikimedia/research-landing-page
> 2. https://phabricator.wikimedia.org/tag/research/
> 3. https://phabricator.wikimedia.org/T107389
> 4. https://www.mediawiki.org/wiki/Wikimedia_Research
>
> On Thu, Feb 8, 2018 at 8:09 AM, Aaron Halfaker <aaron.halfa...@gmail.com>
> wrote:
>
>> Hey folks, I see you're using github[1], but you've disabled the issue
>> tracker there.  Where should I submit bug reports and feature requests?
>> Maybe you could add a link next to "source code" at the bottom of the
>> page.
>>
>> 1. https://github.com/wikimedia/research-landing-page
>>
>> On Thu, Feb 8, 2018 at 10:02 AM, Aaron Halfaker <aaron.halfa...@gmail.com
>> >
>> wrote:
>>
>> > Depends on which standard.  This is not a wiki page so it won't be
>> > translatable using the on-wiki translate tools.  However, it's quite
>> > possible that we could use something like translatewiki.net.  I'm not
>> > sure if that is on the road map.  Dario, what do you think?
>> >
>> > On Thu, Feb 8, 2018 at 1:31 AM, Federico Leva (Nemo) <
>> nemow...@gmail.com>
>> > wrote:
>> >
>> >> Will it be translatable with standard tools?
>> >>
>> >> Federico
>> >>
>> >>
>> >> ___
>> >> Wiki-research-l mailing list
>> >> Wiki-research-l@lists.wikimedia.org
>> >> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>> >>
>> >
>> >
>> ___
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimed

[Wiki-research-l] A new landing page for the Wikimedia Research team

2018-02-06 Thread Dario Taraborelli
*Hey all,We’re thrilled to announce the Wikimedia Research team now has a
simple, navigable, and accessible landing page, making our output,
projects, and resources easy to discover and learn about:
https://research.wikimedia.org <https://research.wikimedia.org/> The
Research team decided to create a single go-to page (T107389
<https://phabricator.wikimedia.org/T107389>) to provide an additional way
to discover information we have on wiki, for the many audiences we would
like to engage with – particularly those who are not already familiar with
how to navigate our projects. On this page, potential academic
collaborators, journalists, funding organizations, and others will find
links to relevant resources, contact information, collaboration and
partnership opportunities, and ways to follow the team's work.There are
many more research resources produced by different teams and departments at
WMF – from Analytics, to Audiences, to Grantmaking, and Programs. If you
see anything that’s missing within the scope of the Research team, please
let us know <https://phabricator.wikimedia.org/T107389>!Dario*


-- 

*Dario Taraborelli  *Director, Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Wikimedia-l] Research Showcase Wednesday, January 17, 2018

2018-01-17 Thread Dario Taraborelli
Hey all,

a reminder that the livestream of our monthly research showcase starts in
45 minutes (11.30 PT)

   - Video: https://www.youtube.com/watch?v=L-1uzYYneUo
   - IRC: #wikimedia-research
   - Abstracts:
   https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#January_2018

Dario


On Tue, Jan 16, 2018 at 9:45 AM, Lani Goto <lg...@wikimedia.org> wrote:

> Hi Everyone,
>
> The next Research Showcase will be live-streamed this Wednesday, January
> 17, 2018 at 11:30 AM (PST) 19:30 UTC.
>
> YouTube stream: https://www.youtube.com/watch?v=L-1uzYYneUo
>
> As usual, you can join the conversation on IRC at #wikimedia-research. And,
> you can watch our past research showcases here.
>
> This month's presentation:
>
> *What motivates experts to contribute to public information goods? A field
> experiment at Wikipedia*
> By Yan Chen, University of Michigan
> Wikipedia is among the most important information sources for the general
> public. Motivating domain experts to contribute to Wikipedia can improve
> the accuracy and completeness of its content. In a field experiment, we
> examine the incentives which might motivate scholars to contribute their
> expertise to Wikipedia. We vary the mentioning of likely citation, public
> acknowledgement and the number of views an article receives. We find that
> experts are significantly more interested in contributing when citation
> benefit is mentioned. Furthermore, cosine similarity between a Wikipedia
> article and the expert's paper abstract is the most significant factor
> leading to more and higher-quality contributions, indicating that better
> matching is a crucial factor in motivating contributions to public
> information goods. Other factors correlated with contribution include
> social distance and researcher reputation.
>
> *Wikihounding on Wikipedia*
> By Caroline Sinders, WMF
> Wikihounding (a form of digital stalking on Wikipedia) is incredibly
> qualitative and quantitive. What makes wikihounding different then
> mentoring? It's the context of the action or the intention. However, all
> interactions inside of a digital space has a quantitive aspect to it, every
> comment, revert, etc is a data point. By analyzing data points
> comparatively inside of wikihounding cases and reading some of the cases,
> we can create a baseline for what are the actual overlapping similarities
> inside of wikihounding to study what makes up wikihounding. Wikihounding
> currently has a fairly loose definition. Wikihounding, as defined by the
> Harassment policy on en:wp, is: “the singling out of one or more editors,
> joining discussions on multiple pages or topics they may edit or multiple
> debates where they contribute, to repeatedly confront or inhibit their
> work. This is with an apparent aim of creating irritation, annoyance or
> distress to the other editor. Wikihounding usually involves following the
> target from place to place on Wikipedia.” This definition doesn't outline
> parameters around cases such as frequency of interaction, duration, or
> minimum reverts, nor is there a lot known about what a standard or
> canonical case of wikihounding looks like. What is the average wikihounding
> case? This talk will cover the approaches myself and members of the
> research team: Diego Saez-Trumper, Aaron Halfaker and Jonathan Morgan are
> taking on starting this research project.
>
> --
> Lani Goto
> Project Assistant, Engineering Admin
> ___
> Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/
> wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/
> wiki/Wikimedia-l
> New messages to: wikimedi...@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>




-- 

*Dario Taraborelli  *Director, Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Fwd: [Analytics] Wikistats gets a facelift - Alpha Launch of Wikistats 2

2017-12-13 Thread Dario Taraborelli
Cross-posting from analytics – very excited about this announcement.
Congrats on the launch!

-- Forwarded message --
From: Nuria Ruiz <nu...@wikimedia.org>
Date: Wed, Dec 13, 2017 at 8:25 PM
Subject: [Analytics] Wikistats gets a facelift - Alpha Launch of Wikistats 2
To: "A mailing list for the Analytics Team at WMF and everybody who has an
interest in Wikipedia and analytics." <analyt...@lists.wikimedia.org>


Hello from Analytics Team!

We are happy to announce the Alpha release of Wikistats 2. Wikistats has
been redesigned for architectural simplicity, faster data processing, and a
more dynamic and interactive user experience. First goal is to match the
numbers of the current system, and to provide the most important reports,
as decided by the Wikistats community (see survey) [1].  Over time, we will
continue to migrate reports and add new ones that you find useful. We can
also analyze the data in new and interesting ways, and look forward to
hearing your feedback and suggestions. [2]

You can go directly to Spanish Wikipedia
https://stats.wikimedia.org/v2/#/es.wikipedia.org

or browse all projects
https://stats.wikimedia.org/v2/#/all-projects

The new site comes with a whole new set of APIs, similar to our existing
Pageview API but with edit data. You can start using them today, they are
documented here:

https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats


FAQ:

Why is this an alpha?
There are features that we feel a full-fledged product should have that are
still missing, such as localization. The data-processing pipeline for the
new Wikistats has been rebuilt from scratch (it uses distributed-computing
tools such as Hadoop) and we want to see how it is used before calling it
final. Also while we aim to update data monthly, it will happen a few days
after the month rolls because of the amount of data to move and compute.

How about comparing data between two wikis?
You can do it with two tabs but we are aware this UI might not solve all
use cases for the most advanced Wikistats users. We aim to tackle those in
the future.

How do I file bugs?
Use the handy link in the footer: https://phabricator.wikimedia.
org/maniphest/task/edit/?title=Wikistats%20Bug=Analytics-
Wikistats,Analytics

How do I comment on design?
The consultation on design already happened but we are still watching the
talk page: https://www.mediawiki.org/wiki/Wikistats_2.0_Design_
Project/RequestforFeedback/Round2


[1] https://www.mediawiki.org/wiki/Analytics/Wikistats/
DumpReports/Future_per_report
[2] https://wikitech.wikimedia.org/wiki/Talk:Analytics/Systems/Wikistats

___
Analytics mailing list
analyt...@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics




-- 

*Dario Taraborelli  *Director, Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Wikipedia, Twitch, sexism, and algorithms

2017-11-17 Thread Dario Taraborelli
Nice piece, thanks for sharing, Giovanni!

On top of Toby's link, check out this board with a list of current research
programs in the space of anti-harassment and community health Wikimedia
Research team is working on:
https://phabricator.wikimedia.org/tag/research-programs/

Dario

On Fri, Nov 17, 2017 at 4:06 AM, Giovanni Luca Ciampaglia <
glciamp...@gmail.com> wrote:

> Hello list!
>
> I have written a short piece on how online communities are using
> algorithmic tools to address issues of gender inequality, harassment, and
> more in general how to create more inclusive environments online. And since
> I know this is a topic has been discussed before here, I thought some of
> you may be interested in reading it:
>
> https://theconversation.com/can-online-gaming-ditch-its-sexist-ways-74493
>
> The main topic is the online gaming platform Twitch, which is sadly in the
> news for yet another episode of harassment, but I mention Wikipedia and
> some of the initiatives to create more inclusive spaces (e.g. Teahouse).
>
> Any feedback is really appreciated. Cheers!
>
> Giovanni
> --
> Giovanni Luca Ciampaglia <glciamp...@gmail.com> ∙ Assistant Research
> Scientist
> IU Network Science Institute <http://iuni.iu.edu/> ∙ glciampaglia.com
> News [image: ]*WWW 2018* ∙ Alternate track on Journalism, Misinformation,
> and Fact Checking:
> https://www2018.thewebconf.org/call-for-papers/misinformation-cfp/
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>



-- 

*Dario Taraborelli  *Director, Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Spam issues

2017-11-07 Thread Dario Taraborelli
Thanks to all of you who reached out reporting spam. Please use
wiki-research-l-ow...@lists.wikimedia.org in the future to report any
abusive behavior that needs the list owners'/moderators' attention.

Dario
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] New research collaboration (Wikimedia/Stanford): Sockpuppet detection in Wikimedia projects

2017-10-02 Thread Dario Taraborelli
Hey all,

I am thrilled to announce a new formal collaboration [1] between the
Wikimedia Foundation and a team of researchers at Stanford University,
aiming to design and improve strategies to detect potential sockpuppets in
Wikimedia projects.
<https://meta.wikimedia.org/wiki/Research:Sockpuppet_detection_in_Wikimedia_projects>,
part of the Wikimedia Foundation's community health program [2].

Sockpuppetry is the use of more than one account on a social platform. It
is a major problem in Wikipedia, as it is frequently used for vandalism,
paid editing, pushing one's point of view into articles, and bypassing the
community guidelines. On English Wikipedia specifically, benign and
malicious uses are well defined [3]. Recent research has elicited deceptive
and non-deceptive behavior in online discussions, and identified that
deceptive sockpuppets primarily aim to create illusion of consensus [4].
Our aim in this project is to design machine learning algorithms to
identify potential sockpuppet accounts on English Wikipedia. We will
leverage data from previously identified sockpuppets [5] to train our
models for detection and create high precision models to identify new
sockpuppets. The expected outcome of this project is a set of open
algorithmic methods, and a report on their performance and limitations,
that could be integrated later on into tools to support community efforts
to identify and flag sockpuppet accounts.

I am excited to launch this collaboration with Srijan Kumar, Tilen Marc,
Jure Leskovec and their group at Stanford, who have a solid record of
research on this topic (and most recently also studied hoaxes in Wikipedia
[6]).

You can follow the progress on this project on its page on Meta
<https://meta.wikimedia.org/wiki/Research:Sockpuppet_detection_in_Wikimedia_projects>,
where we'll be reporting the results, or chime in on the talk page
<https://meta.wikimedia.org/wiki/Research_talk:Sockpuppet_detection_in_Wikimedia_projects>
.

Dario


[1] https://www.mediawiki.org/wiki/Wikimedia_Research#Collaborations

[2]
https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2017-2018/Final/Community_Health#Segment_3:_Research_on_harassment

[3] Wikipedia:Sock puppetry. https://en.wikipedia.org/wiki/
Wikipedia:Sock_puppetry

[4] An Army of Me: Sockpuppets in Online Discussion Communities. S. Kumar,
J. Cheng, J. Leskovec, V.S. Subrahmanian. *Proceedings of World Wide Web
conference*, 2017.

[5] Category:Wikipedia sockpuppets. https://en.wikipedia.org/wiki/
Category:Wikipedia_sockpuppets

[6]
https://meta.wikimedia.org/wiki/Research:Understanding_hoax_articles_on_English_Wikipedia



-- 

*Dario Taraborelli  *Director, Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Kaggle competition to forecast Wikipedia article traffic

2017-07-18 Thread Dario Taraborelli
Wanted to make sure everyone saw this challenge announced by Kaggle:

https://www.kaggle.com/c/web-traffic-time-series-forecasting
https://twitter.com/kaggle/status/887093338117201923

The timeline:


   - September 1st, 2017 - Deadline to accept competition rules.
   - September 1st, 2017 - Team Merger deadline. This is the last day
   participants may join or merge teams.
   - September 1st, 2017 - Final dataset is released.
   - September 10th, 2017 - Final submission deadline.

Competition winners will be revealed after November 10, 2017.

Dario

-- 

*Dario Taraborelli  *Director, Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Research-wmf] Research Showcase, December 21, 2016

2016-12-21 Thread Dario Taraborelli
A reminder that the livestream will start in an hour (11:30am PT / 7:30pm
UTC): https://www.youtube.com/watch?v=nmrlu5qTgyA

If you want to learn more about perceptions of privacy and safety among Tor
users and Wikimedia contributors or are eager to know how much high-quality
content gender-focused initiatives have contributed to Wikipedia, come and
join us today (the discussion will be hosted on IRC).

Dario

On Mon, Dec 19, 2016 at 8:45 AM, Sarah R <srodl...@wikimedia.org> wrote:

> Hi Everyone,
>
> The next Research Showcase will be live-streamed this Wednesday,
> December 21, 2016 at 11:30 AM (PST) 18:30 (UTC).
>
> YouTube stream: https://www.youtube.com/watch?v=nmrlu5qTgyA
>
> As usual, you can join the conversation on IRC at #wikimedia-research.
> And, you can watch our past research showcases here
> <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#December_2016>
> .
>
> The December 2016 Research Showcase includes:
>
> English Wikipedia Quality Dynamics and the Case of WikiProject Women
> ScientistsBy *Aaron Halfaker
> <https://meta.wikimedia.org/wiki/User:Halfak_(WMF)>*With every productive
> edit, Wikipedia is steadily progressing towards higher and higher quality.
> In order to track quality improvements, Wikipedians have developed an
> article quality assessment rating scale that ranges from "Stub" at the
> bottom to "Featured Articles" at the top. While this quality scale has the
> promise of giving us insights into the dynamics of quality improvements in
> Wikipedia, it is hard to use due to the sporadic nature of manual
> re-assessments. By developing a highly accurate prediction model (based on
> work by Warncke-Wang et al.), we've developed a method to assess an
> articles quality at any point in history. Using this model, we explore
> general trends in quality in Wikipedia and compare these trends to those of
> an interesting cross-section: Articles tagged by WikiProject Women
> Scientists. Results suggest that articles about women scientists were lower
> quality than the rest of the wiki until mid-2013, after which a dramatic
> shift occurred towards higher quality. This shift may correlate with (and
> even be caused by) this WikiProjects initiatives.
>
>
> Privacy, Anonymity, and Perceived Risk in Open Collaboration. A Study of
> Tor Users and WikipediansBy *Andrea Forte*In a recent qualitative study
> to be published at CSCW 2017, collaborators Rachel Greenstadt, Naz
> Andalibi, and I examined privacy practices and concerns among contributors
> to open collaboration projects. We collected interview data from people who
> use the anonymity network Tor who also contribute to online projects and
> from Wikipedia editors who are concerned about their privacy to better
> understand how privacy concerns impact participation in open collaboration
> projects. We found that risks perceived by contributors to open
> collaboration projects include threats of surveillance, violence,
> harassment, opportunity loss, reputation loss, and fear for loved ones. We
> explain participants’ operational and technical strategies for mitigating
> these risks and how these strategies affect their contributions. Finally,
> we discuss chilling effects associated with privacy loss, the need for open
> collaboration projects to go beyond attracting and educating participants
> to consider their privacy, and some of the social and technical approaches
> that could be explored to mitigate risk at a project or community level.
>
> --
> Sarah R. Rodlund
> Senior Project Coordinator-Engineering, Wikimedia Foundation
> srodl...@wikimedia.org
>
> ___
> Research-wmf mailing list
> research-...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/research-wmf
>
>


-- 

*Dario Taraborelli  *Director, Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] another pageview db to download

2016-12-15 Thread Dario Taraborelli
Thanks for the release, Alex. I am sorry to see this resource go but agree
the data will be of great interest to researchers / app developers.

In terms of how to best store the data and metadata for long-term
preservation and discoverability, my recommendation is to use an open data
registry where you can describe the dataset, make it citable and
discoverable, add metadata and assign the entry a unique and persistent
identifier.

Services like Zenodo <https://zenodo.org/> or figshare
<https://figshare.com/> (the one we've used for our data releases at WMF,
see for example the clickstream dataset
<https://dx.doi.org/10.6084/m9.figshare.1305770.v21>) are good options to
do this.

Dario

On Sun, Dec 11, 2016 at 11:53 PM, Federico Leva (Nemo) <nemow...@gmail.com>
wrote:

> Alex Druk, 12/12/2016 08:32:
>
>> For a few years I have maintained a web site wikipediatrends.com
>> <http://wikipediatrends.com>. For variety of reasons I cannot do it any
>> more and the site will be closed in January.
>> However, our DB of English wikipedia pageviews from 2007 can be used for
>> other projects. Any person who wish to get it please see  info below.
>>
>
> Thanks. Can you please upload those files to the Internet Archive? You can
> use the https://internetarchive.readthedocs.io/en/latest/cli.html#upload
> CLI with mediatype "data", collection "opensource" and subject "Wikipedia;
> enwiki".
>
> Nemo
>
> A few words about DB. We keep data in separate files for each page. Each
>> file is csv with lines started with year and followed by pageviews for
>> each day. Page name is md5 encoded  and used as name of the file. Page
>> names are in separate Berkley DB file. The total size of DB is about
>> 30GB. It is in 3 archived files ~ 10 GB.
>> You can download DB as 12/03/2016 from:
>> https://s3-us-west-2.amazonaws.com/adrouk/november2016/rdd112016_1.tar.gz
>> https://s3-us-west-2.amazonaws.com/adrouk/november2016/rdd112016_2.tar.gz
>> https://s3-us-west-2.amazonaws.com/adrouk/november2016/articles112016.db
>> As June 2015:
>> https://s3-us-west-2.amazonaws.com/adrouk/june2015/rdd62015_1.tar.gz
>> <https://s3-us-west-2.amazonaws.com/adrouk/june2015/rdd62015_1.tar.gz>
>> https://s3-us-west-2.amazonaws.com/adrouk/june2015/rdd62015_2.tar.gz
>> <https://s3-us-west-2.amazonaws.com/adrouk/june2015/rdd62015_2.tar.gz>
>> https://s3-us-west-2.amazonaws.com/adrouk/june2015/articles62015.db
>> <https://s3-us-west-2.amazonaws.com/adrouk/june2015/articles62015.db>
>> Please do not hesitate to ask any question about DB. If by any chance
>> you are interested in the site also, please contact me of the list.
>> Enjoy!
>>
>> ---
>> Thank you.
>>
>> Alex Druk, PhD
>> wikipediatrends.com
>> <http://wikipediatrends.com/>alex.d...@gmail.com
>> <mailto:alex.d...@gmail.com>
>> (775) 237-8550 <tel:(775)%20237-8550> Google voice
>>
>>
>>
>> _______
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>



-- 

*Dario Taraborelli  *Director, Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] supporting Wikipedia citations as formal scholarly reputation in tenure committees

2016-11-17 Thread Dario Taraborelli
James – I'm interested in reading [1] but the PDF is behind a login screen,
can I read this somewhere else (or do you have the full reference so I can
search it)?

Thanks,
Dario

On Mon, Oct 31, 2016 at 9:43 PM, James Salsman <jsals...@gmail.com> wrote:

> Can anyone familiar with European Commission procedure please explain
> how to support the Wikipedia-associated proposals in [1] based on the
> statistics in [2] please? Very recent publications such as [3] in
> Nature along with what appears to be a relatively sudden groundswell
> of frankness and support e.g. [4] suggests to me that the time is
> right to get out in front of these proposals.
>
> [1] https://www.researchgate.net/profile/David_Nicholas5/
> publication/275349828_Emerging_reputation_mechanisms_for_scholars/links/
> 553a22a60cf2c415bb06e6b7.pdf
>
> [2] http://www.cs.indiana.edu/~xshuai/papers/jcdl240-shuai.pdf
>
> [3] http://www.nature.com/news/fewer-numbers-better-science-1.20858
>
> [4] http://blog.scielo.org/en/2016/10/14/is-it-possible-to-
> normalize-citation-metrics/
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>



-- 

*Dario Taraborelli  *Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Google open source research on automatic image captioning

2016-09-27 Thread Dario Taraborelli
I forwarded this separately to internally at WMF a few days ago. Clearly –
before thinking of building workflows for human contributors to generate
captions or rich descriptors of media files in Commons – we should look at
what's available in terms of off-the-shelf machine learning services and
libraries.

#1 rule of sane citizen science/crowdsourcing projects: don't ask humans to
perform tedious tasks machines are pretty good at, get humans to curate
inputs and outputs of machines instead.

D

On Mon, Sep 26, 2016 at 5:55 PM, Pine W <wiki.p...@gmail.com> wrote:

> Perhaps of interest: "...We’re making the latest version of our image
> captioning system available as an open source model in TensorFlow."
> https://research.googleblog.com/2016/09/show-and-tell-
> image-captioning-open.html
>
> Pine
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>


-- 

*Dario Taraborelli  *Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] SPARQL workshop and WDQS tutorials

2016-09-15 Thread Dario Taraborelli
The Wikimedia Foundation's Discovery and Research teams recently hosted an
introductory workshop on the SPARQL query language and the Wikidata Query
Service.

We made the video stream <https://www.youtube.com/watch?v=NaMdh4fXy18> and
materials
<https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/2016_SPARQL_Workshop>
(demo
queries, slidedecks) from this workshop publicly available.

Guest speakers:

   - Ruben Verborgh, *Ghent University* and *Linked Data Fragments*
   - Benjamin Good, *Scripps Research Institute* and *Gene Wiki*
   - Tim Putman, *Scripps Research Institute* and *Gene Wiki*
   - Lucas, *@WikidataFacts*


Dario and Stas


*Dario Taraborelli  *Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Thinking big: scaling up Wikimedia's contributor population by two orders of magnitude

2016-08-30 Thread Dario Taraborelli
Music Division, The New York Public Library for the Performing Arts
>> blog:  http://www.nypl.org/blog/author/44   Twitter: @kos2
>>  Listowner: OPERA-L ; SMT-ANNOUNCE ; SoundForge-users
>> - My opinions do not necessarily represent those of my institutions -
>>
>> *Inspiring Lifelong Learning* | *Advancing Knowledge* | *Strengthening
>> Our Communities *
>>
>> ___
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>


-- 

*Dario Taraborelli  *Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Thinking big: scaling up Wikimedia's contributor population by two orders of magnitude

2016-08-27 Thread Dario Taraborelli
Nice, thought-provoking post, Pine.

Here's my take on two ways to attract a population of good-faith
contributors 1 or 2 orders of magnitude larger than the current one, based
on what I've seen over the last couple of years:

*Gamified interfaces for microcontributions à la Wikidata game*.
(per GerardM) there's absolutely no doubt this model is effective at
creating a large volume of high-quality edits, and value to the project and
communities. So far these tools have been primarily targeted at an existing
(and relatively small) population of core contributors and the only attempt
at expanding this to a much broader contributor base (WikiGrok) were too
premature. I do expect we will see more and more of lightweight distributed
curation in the next 5-10 years. In my opinion Wikidata is ready to
experiment with a much larger number of single-purpose contributory
interfaces (around missing images, translations, label evaluation,
referencing etc)

*Ubiquitous outreach, supported by dedicated technology*.
I called out in my Wikimania 2014 talk
<http://www.slideshare.net/dartar/wikimania-2014-the-missing-wikipedia-ads>
the fact that the single, most effective initiative ever run to attract new
contributors has been WLM (I am intentionally not including initiatives
like WP in the classroom as they target a pre-defined population such as
students, but they are probably the most advanced example in this
category). Creating tools such as recommender systems and todo lists *tailored
to the interests of particular, intrinsically motivated contributors* as
well as the analytics dashboards <http://tools.wmflabs.org/hashtags/> to
measure the relative impact and best design of these programs, is the most
promising venue to expand the Wikimedia contributor population.

My 2 cents. How making the edit button 10x larger is not a solution to this
problem is a topic I'll reserve to a separate thread.

Thanks for starting this thread.

Dario

On Sat, Aug 27, 2016 at 5:32 AM, rupert THURNER <rupert.thur...@gmail.com>
wrote:

> On Sat, Aug 27, 2016 at 11:08 AM, Amir E. Aharoni <
> amir.ahar...@mail.huji.ac.il> wrote:
>
>> The English Wikipedia alone has hundreds of thousands of items to fix -
>> missing references, misspellings, etc. The problems are nicely sorted at
>> https://en.m.wikipedia.org/wiki/Category:Wikipedia_backlog . There are
>> millions of other things to fix in other projects. So quality is getting
>> higher in many ways, but the amount of stuff to fix is still enormous.
>>
>> What we don't have is an easy way for new people to start eliminating
>> items from the backlogs. The Wikidata games are a nice step in the right
>> direction, but their appeal to new participants is non-existent.
>>
>
> there is a backlog? after 15 years contributing you tell that on the
> research mailing list :) i used wikidata games for a couple of minutes and
> great pleasure when i see the link flying by in an email. but i am never
> able to find that link again in my life. maybe that is the problem? rename
> the "donate" link to "contribute" and then have "money" and "time" which
> links to code and content. just my 2c ...
>
> rupert
>
>
> ___________
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>


-- 

*Dario Taraborelli  *Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] The Wikimedia Research Showcase (July 2016): Detecting personal attacks on Wikipedia; Wikipedia.org Portal Research

2016-07-20 Thread Dario Taraborelli
reminder: this will be livestreamed in 5 minutes. You can watch it later on
YouTube if you can't make it.

On Tue, Jul 19, 2016 at 9:10 AM, Dario Taraborelli <
dtarabore...@wikimedia.org> wrote:

> We put the research showcase on hold for the past quarter due to other
> outreach initiatives (Wiki Workshop '16, WikiCite, Wikimania).
>
> We're back this month with two presentations by the Wikimedia Research
> team and our collaborators at Jigsaw.
>
> The showcase will be streamed
> <https://www.youtube.com/watch?v=eZgqzVuRDRs> on YouTube *tomorrow
> Wednesday July 20*, starting at *11.30 Pacific Time*. As usual, we'll be
> hosting a Q via our IRC channel (#wikimedia-research on irc.freenode.net
> ).
>
> Look forward to seeing you there!
>
> Dario
>
>
>
> *Detecting Personal Attacks on Wikipedia*By
> *Ellery Wulczyn
> <https://meta.wikimedia.org/wiki/User:Ewulczyn_(WMF)>, Nithum Thain
> <https://meta.wikimedia.org/wiki/User:nthain>*
> Ellery Wulczyn (WMF) and Nithum Thain (Jigsaw) will be speaking about
> their recent work on Project Detox, a research project to develop tools to
> detect and understand online personal attacks and harassment on Wikipedia.
> Their talk will cover the whole research pipeline to date, including data
> acquisition, machine learning model building, and some analytical insights
> as to the nature of personal attacks on Wikipedia talk pages.
>
>
>
> *Wikipedia.org Portal Research*Search behaviors and New Language by
> article count Dropdown
> By
> *Daisy Chen <https://meta.wikimedia.org/wiki/User:Dchen_(WMF)>*
> What part do the Wikipedia.org portal and on-wiki search mechanisms play
> in users' experiences finding information online? These findings reflect
> research participants' responses to a combination of generative and
> evaluative questions about their general online search behaviors, on-wiki
> search behaviors, interactions with the Wikipedia.org portal, and their
> thoughts about a partial re-design of the portal page, the new language by
> article count dropdown.
>
>
>
> *Dario Taraborelli  *Head of Research, Wikimedia Foundation
> wikimediafoundation.org • nitens.org • @readermeter
> <http://twitter.com/readermeter>
>



-- 


*Dario Taraborelli  *Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] The Wikimedia Research Showcase (July 2016): Detecting personal attacks on Wikipedia; Wikipedia.org Portal Research

2016-07-19 Thread Dario Taraborelli
We put the research showcase on hold for the past quarter due to other
outreach initiatives (Wiki Workshop '16, WikiCite, Wikimania).

We're back this month with two presentations by the Wikimedia Research team
and our collaborators at Jigsaw.

The showcase will be streamed <https://www.youtube.com/watch?v=eZgqzVuRDRs>
on YouTube *tomorrow Wednesday July 20*, starting at *11.30 Pacific Time*.
As usual, we'll be hosting a Q via our IRC channel (#wikimedia-research
on irc.freenode.net).

Look forward to seeing you there!

Dario



*Detecting Personal Attacks on Wikipedia*By
*Ellery Wulczyn
<https://meta.wikimedia.org/wiki/User:Ewulczyn_(WMF)>, Nithum Thain
<https://meta.wikimedia.org/wiki/User:nthain>*
Ellery Wulczyn (WMF) and Nithum Thain (Jigsaw) will be speaking about their
recent work on Project Detox, a research project to develop tools to detect
and understand online personal attacks and harassment on Wikipedia. Their
talk will cover the whole research pipeline to date, including data
acquisition, machine learning model building, and some analytical insights
as to the nature of personal attacks on Wikipedia talk pages.



*Wikipedia.org Portal Research*Search behaviors and New Language by article
count Dropdown
By
*Daisy Chen <https://meta.wikimedia.org/wiki/User:Dchen_(WMF)>*
What part do the Wikipedia.org portal and on-wiki search mechanisms play in
users' experiences finding information online? These findings reflect
research participants' responses to a combination of generative and
evaluative questions about their general online search behaviors, on-wiki
search behaviors, interactions with the Wikipedia.org portal, and their
thoughts about a partial re-design of the portal page, the new language by
article count dropdown.



*Dario Taraborelli  *Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] stream.wikimedia.org

2016-07-08 Thread Dario Taraborelli
Hey Bruno :) Records from the recentchanges table
<https://www.mediawiki.org/wiki/Manual:Recentchanges_table> are purged
after 90 days. You can get similar data on any arbitrary time range
via the revision
table <https://www.mediawiki.org/wiki/Manual:Revision_table>.

HTH,
Dario

On Wed, Jul 6, 2016 at 1:40 PM, Bruno Goncalves <bgoncal...@gmail.com>
wrote:

> A few months should be more than enough to what I need.
> Hopefully I won't hit agains too many limitations on Quarry to download it.
> Thanks!
>
> Bruno
>
> ***
> Bruno Miguel Tavares Gonçalves, PhD
> Homepage: www.bgoncalves.com
> Email: bgoncal...@gmail.com
> ***
>
> On Wed, Jul 6, 2016 at 3:25 PM, Aaron Halfaker <aaron.halfa...@gmail.com>
> wrote:
>
>> How much past data do you need?  RCstream essentially reproduces the
>> recentchanges tables of each wiki.  You can get the last few months of
>> changes through quarry.  https://quarry.wmflabs.org/query/10807
>>
>> On Wed, Jul 6, 2016 at 2:21 PM, Bruno Goncalves <bgoncal...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I've been playing with the RC stream and I was wondering if there is any
>>> place where I can download past data? I've tried looking in the dump
>>> directory (https://dumps.wikimedia.org/) and google but without much
>>> luck. Any help would be greatly appreciated.
>>>
>>> Best,
>>>
>>> Bruno
>>>
>>> ***
>>> Bruno Miguel Tavares Gonçalves, PhD
>>> Homepage: www.bgoncalves.com
>>> Email: bgoncal...@gmail.com
>>> ***
>>>
>>> ___
>>> Wiki-research-l mailing list
>>> Wiki-research-l@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>
>>>
>>
>> ___
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>


-- 


*Dario Taraborelli  *Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Research FAQ gets a facelift

2016-06-20 Thread Dario Taraborelli
We just released a new version of Research:FAQ on Meta [1], significantly
expanded and updated, to make our processes at WMF more transparent and to
meet an explicit FDC request to clarify the role and responsibilities of
individual teams involved in research across the organization.

The previous version – written from the perspective of the (now inactive)
Research:Committee, and mostly obsolete since the release of WMF's open
access policy [2] – can still be found here [3].

Comments and bold edits to the new version of the document are welcome. For
any question or concern, you can drop me a line or ping my username on-wiki.

Thanks,
Dario

[1] https://meta.wikimedia.org/wiki/Research:FAQ
[2] https://wikimediafoundation.org/wiki/Open_access_policy
[3] https://meta.wikimedia.org/w/index.php?title=Research:FAQ=15176953


*Dario Taraborelli  *Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Wiki Workshop at ICWSM '16: Accepted papers and invited speakers

2016-05-14 Thread Dario Taraborelli
We are glad to announce our invited speaker lineup and 19 papers accepted
at the wiki research workshop <http://snap.stanford.edu/wikiworkshop2016/> we
will be hosting on May 17, 2016 at the *10th International AAAI Conference
on Web and Social Media* (ICWSM '16 <http://www.icwsm.org/2016/>) in
Cologne, Germany. If you're attending the conference and interested in
Wikipedia, Wikidata, Wikimedia research, please consider registering for
the workshop. This is the second part of a workshop previously hosted at WWW
'16 <http://www.icwsm.org/2016/> in Montréal, Canada, in April. For more
information, you can visit the workshop's website
<http://snap.stanford.edu/wikiworkshop2016/> or follow us on Twitter (
@wikiworkshop16 <https://twitter.com/wikiworkshop16>).
Invited speakers <http://snap.stanford.edu/wikiworkshop2016/#speakers-icwsm>

   - *Ofer Arazy* (*University of Haifa*) Emergent Work in Wikipedia
   - *Jürgen Pfeffer* (*TU Munich*) Applying Social Network Analysis
   Metrics to Large-Scale Hyperlinked Data
   - *Martin Potthast* (*Universität Weimar*) Wikipedia Text Mining:
   Uncovering Quality and Reuse
   - *Fabian Suchanek* (*Télécom ParisTech*) A Hitchhiker's Guide to
   Ontology
   - *Claudia Wagner* (*GESIS*) Gender Inequalities in Wikipedia

Accepted papers <http://snap.stanford.edu/wikiworkshop2016/#papers-icwsm>


   - *Yashaswi Pochampally, Kamalakar Karlapalem and Navya Yarrabelly*
   Semi-Supervised Automatic Generation of Wikipedia Articles for Named
   Entities
   - *Joan Guisado-Gámez, Josep Lluís Larriba-Pey, David Tamayo and Jordi
   Urmeneta*
   ENRICH: A Query Expansion Service Powered by Wikipedia Graph Structure
   - *Ioannis Protonotarios, Vasiliki Sarimpei and Jahna Otterbacher*
   Similar Gaps, Different Origins? Women Readers and Editors at Greek
   Wikipedia
   -
*Sven Heimbuch and Daniel Bodemer *Wiki Editors' Acceptance of Additional
   Guidance on Talk Pages
   - *Yerali Gandica, Renaud Lambiotte and Timoteo Carletti*
   What Can Wikipedia Tell Us about the Global or Local Character of
   Burstiness?
   - *Andreas Spitz, Vaibhav Dixit, Ludwig Richter, Michael Gertz and
   Johanna Geiß*
   State of the Union: A Data Consumer's Perspective on Wikidata and Its
   Properties for the Classification and Resolution of Entities
   - *Finn Årup Nielsen*
   Literature, Geolocation and Wikidata
   - *Ana Freire, Matteo Manca, Diego Saez-Trumper, David Laniado, Ilaria
   Bordino, Francesco Gullo and Andreas Kaltenbrunner*
   Graph-Based Breaking News Detection on Wikipedia
   - *Alexander Dallmann, Thomas Niebler, Florian Lemmerich and Andreas
   Hotho*
   Extracting Semantics from Random Walks on Wikipedia: Comparing Learning
   and Counting Methods
   - *Arpit Merchant, Darshit Shah and Navjyoti Singh*
   In Wikipedia We Trust: A Case Study
   - *Thomas Palomares, Youssef Ahres, Juhana Kangaspunta and Christopher
   Ré*
   Wikipedia Knowledge Graph with DeepDive
   - *Lu Xiao*
   Hidden Gems in the Wikipedia Discussions: The Wikipedians' Rationales
   - *Sooyoung Kim and Alice Oh*
   Topical Interest and Degree of Involvement of Bilingual Editors in
   Wikipedia
   - *Lambert Heller, Ina Blümel, Simone Cartellieri and Christian Wartena*
   A Proposed Solution for Discovery of Reusable Technology Pictures Using
   Textmining of Surrounding Article Text, Based on the Infrastructure of
   Wikidata, Wikisource and Wikimedia Commons
   - *Behzad Tabibian, Mehrdad Farajtabar, Isabel Valera, Le Song, Bernhard
   Schölkopf and Manuel Gomez Rodriguez*
   On the Reliability of Information and Trustworthiness of Web Sources in
   Wikipedia
   - *Ruth Garcia Gavilanes, Milena Tsvetkova and Taha Yasseri*
   Collective Remembering in Wikipedia: The Case of Aircraft Crashes
   - *Elena Labzina*
   The Political Salience Dynamics and Users' Interaction Using the Example
   of Wikipedia within the Authoritarian Regime Context
   - *Fabian Flöck and Maribel Acosta*
   WikiLayers – A Visual Platform for Analyzing Content Evolution and
   Editing Dynamics in Wikipedia
   - *Olga Zagarova, Tatiana Sennikova, Claudia Wagner and Fabian Flöck*
   Cultural Relation Mining on Wikipedia: Beyond Culinary Analysis

 OrganizersBob West, *Stanford University & Wikimedia Foundation*Leila
Zia, *Wikimedia
Foundation*Dario Taraborelli, *Wikimedia Foundation*Jure Leskovec, *Stanford
University*
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Analytics] Wikipedia Clickstream dataset refreshed (March 2016)

2016-05-02 Thread Dario Taraborelli
Hey Thomas,

yes, I agree this dataset is really valuable (just looking at the sheer
number of downloads  [1] and requests for similar data we've received). I
can see the value of making it more easily accessible via an API.

Ellery and I have been talking about the idea of – at the very least –
scheduling the generation of new dumps, if not exposing the data
programmatically. Right now, I am afraid this is not within my team's
capacity and Analytics has a number of other high-priority areas to focus
on. We were planning to talk to Joseph et al anyway and decide how to move
forward (hi Joseph!), we'll report back on the lists as soon as this
happens.

Dario

[1] https://figshare.altmetric.com/details/3707715



On Mon, May 2, 2016 at 3:12 AM, Thomas Steiner <to...@google.com> wrote:

> Hi Dario,
>
> This data is super interesting! How realistic is it that your team
> make it available through the Wikimedia REST API [1]? I would then in
> turn love to add it to Wikipedia Tools [2], just imagine how amazing
> it would be to be able to ask a spreadsheet for…
>
>   =WIKI{OUT|IN}BOUNDTRAFFIC("en:London", TODAY()-2, TODAY()-1)
>
> …(or obviously the API respectively) and get the results back
> immediately without the need to download a dump first. What do you
> think?
>
> Cheers,
> Tom
>
> --
> [1] https://wikimedia.org/api/rest_v1/?doc
> [2] http://bit.ly/wikipedia-tools-add-on
>
> --
> Dr. Thomas Steiner, Employee (http://blog.tomayac.com,
> https://twitter.com/tomayac)
>
> Google Germany GmbH, ABC-Str. 19, 20354 Hamburg, Germany
> Managing Directors: Matthew Scott Sucherman, Paul Terence Manicle
> Registration office and registration number: Hamburg, HRB 86891
>
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v2.0.29 (GNU/Linux)
>
>
> iFy0uwAntT0bE3xtRa5AfeCheCkthAtTh3reSabiGbl0ck0fjumBl3DCharaCTersAttH3b0ttom
> hTtPs://xKcd.cOm/1181/
> -END PGP SIGNATURE-
>
> ___
> Analytics mailing list
> analyt...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>



-- 


*Dario Taraborelli  *Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Wikipedia Clickstream dataset refreshed (March 2016)

2016-04-28 Thread Dario Taraborelli
Hey all,

heads up that a refreshed Wikipedia Clickstream dataset is now available
for March 2016, containing 25 million (referer, resource) pairs extracted
from about 7 billion webrequests.

https://dx.doi.org/10.6084/m9.figshare.1305770.v16

Ellery (the author of the dataset) is cc'ed if you have any questions, or
you can chime in on the talk page of the dataset entry on Meta
<https://meta.wikimedia.org/wiki/Research:Wikipedia_clickstream>.

Show us what you do with this data, if you use it in your research.

Dario

*Dario Taraborelli  *Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Sharing Wiki related research data

2016-04-22 Thread Dario Taraborelli
The Wikimedia Research team uses both figshare (per Jonathan) and the
datahub <https://datahub.io/organization/wikimedia>. The latter is open
source and powered by CKAN, but it' s unfortunately unmaintained. Figshare
has the benefits of providing full DataCite compliance and DOIs (Github is
also a good option, specifically since they added DOI support).

We're currently exploring other alternatives (such as Xenodo) but at the
moment we're still using the above 2 options.

On Fri, Apr 22, 2016 at 12:16 AM, Nicolas Jullien <
nicolas.jull...@telecom-bretagne.eu> wrote:

> Hello all,
>
> there is this initiative regarding the repository and description of the
> data base you may want to consider:
> http://www.datafactories.org/
> And I think Flossmole may be interested too, even if their primary target
> was FLOSS data, http://flossmole.org/
>
> NJ
>
> Le 21/04/2016 18:15, Jonathan Morgan a écrit :
>
>> Many WMF researchers use https://figshare.com
>>
>> Jonathan
>>
>> On Thu, Apr 21, 2016 at 7:38 AM, Robert Jäschke <jaesc...@l3s.de
>> <mailto:jaesc...@l3s.de>> wrote:
>>
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA256
>>
>> Dear Moritz,
>>
>> On 21.04.2016 16 <tel:21.04.2016%2016>:32, Physikerwelt wrote:
>> > is there a central data repository, we want to use to share
>> > research data. We put our data in the release of the GitHub
>> > repository~[1], but that might not be optimal.
>>
>> What about http://zenodo.org/?
>>
>>
>> Best regards,
>> Robert
>>
>> -BEGIN PGP SIGNATURE-
>> Version: GnuPG v2
>>
>> iQIcBAEBCAAGBQJXGOXQAAoJEPZY2c/EvlKYkyIP+QExmDDMv03ojkFKxaAUPx4H
>> VXNDuorMJCQgDjkHqOJuEA2qwx3aE4C/KDSwmC+5NM4OTxvcOJzVhEMYWxW+ynm9
>> H0jqp0tyCxaS3R6mRZMZkqoTiHGfGJ4VwwtsG9TicW2mbohj97EAXAEMUg/xu2MZ
>> Hj6RAo1crGpwsr2JBEjvUCpXC58hHfzd7Cpqcr6FB4CkdhkRYMoH+k2afieBvOOH
>> UoT5//jeRn/h/4qrR/+gW0Bn7OtUNaTOnxx/iZX2yXb3z6MaMiXpCWarw6XplpM+
>> COSWo382RAGh1sWp0Y4hca/GojS/x/E/LO2S+/tU53oTBccC2ATB6G/VbTmGkp/G
>> A9WGQEVQs47idwV6M0MdAXKzBYBIoGRXWtl8k7Ca3y6RdNLqW9WdBJdFsK2omimG
>> UIr7WIYuqqiYAX1iEVFfdmyqP4TEel9mYkEeeVVu3VIT6mX+VOuG/I3j86j9wA4H
>> nY4c8+lAuEmn9khVCw/wKwCoT5Hi9wJFzTrUVCuNeEmd8Bnr83SkYGfh78SC5hW+
>> xliOJ4TAc9bBPiZwfRuJuL/T4VEVsIqlHr/DfsaUro8hUL0vz6+uL18OydO67O08
>> y342VfRvHA7776bJ0SAorI4adqfdVqC1WNvxka02HwiSN1ueuntHB11htZX9FXzC
>> rXnSxQrGeCGfr3okxz1D
>> =+fXN
>> -END PGP SIGNATURE-
>>
>> ___
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> <mailto:Wiki-research-l@lists.wikimedia.org>
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
>>
>>
>> --
>> Jonathan T. Morgan
>> Senior Design Researcher
>> Wikimedia Foundation
>> User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
>>
>>
>>
>> ___
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
>
> --
> Maître de Conférences (HDR) / Associate Professor.
> LUSSI - iSchool, M@rsouin-ICI. Institut Mines-Télécom Bretagne & UBL
> In charge of the Master "Information Systems Project Management and
> Consulting"
> http://www.telecom-bretagne.eu/studies/msc/information-systems-management/
> Co-animator of the "ICT and Society" Institut Mines-Telecom's research
> network
>
> https://nicolasjullien.wp.mines-telecom.fr/
> Skype: Nicolas.Jullien1
> Tel +33 (0) 229 001 245
> Télécom Bretagne, Technopôle Brest Iroise CS 83818
> 29238 BREST CEDEX 3
>
>
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>



-- 


*Dario Taraborelli  *Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Wikimedia Research: Q3 quarterly reviews and Q4 goals

2016-04-18 Thread Dario Taraborelli
Hey all,

heads up that we published the quarterly review slidedeck for Q3 

 (January - March, 2016) for the Wikimedia Research 
 department (slides 2-18): 
check them out for a high-level overview of our main accomplishments over the 
last three months. The Q4 goals (April - June 2016) for each team are listed on 
this page 
.

Dario___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Wikimedia-l] Research showcase: Evolution of privacy loss in Wikipedia

2016-03-19 Thread Dario Taraborelli
On Wed, Mar 16, 2016 at 7:53 PM, SarahSV <sarahsv.w...@gmail.com> wrote:

> Dario and Aaron, thanks for letting us know about this. Is the research
> available in writing for people who don't want to sit through the video?
>
> Sarah
>

Sarah – yes, see http://cm.cecs.anu.edu.au/post/wikiprivacy/

On Wed, Mar 16, 2016 at 12:55 PM, Aaron Halfaker <ahalfa...@wikimedia.org>
> wrote:
>
> > Reminder, this showcase is starting in 5 minutes.  See the stream here:
> > https://www.youtube.com/watch?v=Xle0oOFCNnk
> >
> > Join us on Freenode at #wikimedia-research
> > <http://webchat.freenode.net/?channels=wikimedia-research> to ask Andrei
> > questions.
> >
> > -Aaron
> >
> > On Tue, Mar 15, 2016 at 12:53 PM, Dario Taraborelli <
> > dtarabore...@wikimedia.org> wrote:
> >
> > > This month, our research showcase
> > > <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#March_2016
> >
> > hosts
> > > Andrei Rizoiu (Australian National University) to talk about his work
> > > <http://cm.cecs.anu.edu.au/post/wikiprivacy/> on *how private traits
> of
> > > Wikipedia editors can be exposed from public data* (such as edit
> > > histories) using off-the-shelf machine learning techniques. (abstract
> > below)
> > >
> > > If you're interested in learning what the combination of machine
> learning
> > > and public data mean for privacy and surveillance, come and join us
> this
> > *Wednesday
> > > March 16* at *1pm Pacific Time*.
> > >
> > > The event will be recorded and publicly streamed
> > > <https://www.youtube.com/watch?v=Xle0oOFCNnk>. As usual, we will be
> > > hosting the conversation with the speaker and Q on the
> > > #wikimedia-research channel on IRC.
> > >
> > > Looking forward to seeing you there,
> > >
> > > Dario
> > >
> > >
> > > Evolution of Privacy Loss in WikipediaThe cumulative effect of
> collective
> > > online participation has an important and adverse impact on individual
> > > privacy. As an online system evolves over time, new digital traces of
> > > individual behavior may uncover previously hidden statistical links
> > between
> > > an individual’s past actions and her private traits. To quantify this
> > > effect, we analyze the evolution of individual privacy loss by studying
> > > the edit history of Wikipedia over 13 years, including more than
> 117,523
> > > different users performing 188,805,088 edits. We trace each Wikipedia’s
> > > contributor using apparently harmless features, such as the number of
> > edits
> > > performed on predefined broad categories in a given time period (e.g.
> > > Mathematics, Culture or Nature). We show that even at this unspecific
> > level
> > > of behavior description, it is possible to use off-the-shelf machine
> > > learning algorithms to uncover usually undisclosed personal traits,
> such
> > as
> > > gender, religion or education. We provide empirical evidence that the
> > > prediction accuracy for almost all private traits consistently improves
> > > over time. Surprisingly, the prediction performance for users who
> stopped
> > > editing after a given time still improves. The activities performed by
> > new
> > > users seem to have contributed more to this effect than additional
> > > activities from existing (but still active) users. Insights from this
> > work
> > > should help users, system designers, and policy makers understand and
> > make
> > > long-term design choices in online content creation systems.
> > >
> > >
> > > *Dario Taraborelli  *Head of Research, Wikimedia Foundation
> > > wikimediafoundation.org • nitens.org • @readermeter
> > > <http://twitter.com/readermeter>
> > >
> > > ___
> > > Wiki-research-l mailing list
> > > Wiki-research-l@lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> > >
> > _______
> > Wikimedia-l mailing list, guidelines at:
> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > New messages to: wikimedi...@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>
> >
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> New messages to: wikimedi...@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>




-- 


*Dario Taraborelli  *Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] WWW 2016 Wiki Workshop: accepted papers

2016-03-19 Thread Dario Taraborelli
We're thrilled to announce the list of papers accepted at the WWW 2016 Wiki
Workshop <http://snap.stanford.edu/wikiworkshop2016/>. You can follow
@wikiworkshop16 <https://twitter.com/wikiworkshop16> for updates.

Dario
(on behalf of the organizers)


Johanna Geiß and Michael Gertz
With a Little Help from my Neighbors: Person Name Linking Using the
Wikipedia Social Network
Ramine Tinati, Markus Luczak-Roesch and Wendy Hall
Finding Structure in Wikipedia Edit Activity: An Information Cascade
Approach
Paolo Boldi and Corrado Monti
Cleansing Wikipedia Categories using Centrality
Thomas Steiner
Wikipedia Tools for Google Spreadsheets
Yu Suzuki and Satoshi Nakamura
Assessing the Quality of Wikipedia Editors through Crowdsourcing
Vikrant Yadav and Sandeep Kumar
Learning Web Queries For Retrieval of Relevant Information About an Entity
in a Wikipedia Category
Haggai Roitman, Shay Hummel, Ella Rabinovich, Benjamine Sznajder, Noam
Slonim and Ehud Aharoni
On the Retrieval of Wikipedia Articles Containing Claims on Controversial
Topics
Tanushyam Chattopadhyay, Santa Maiti and Arindam Pal
Automatic Discovery of Emerging Trends using Cluster Name Synthesis on User
Consumption Data
Freddy Brasileiro, João Paulo A. Almeida, Victorio A. Carvalho and
Giancarlo Guizzardi
Applying a Multi-Level Modeling Theory to Assess Taxonomic Hierarchies in
Wikidata



*Dario Taraborelli  *Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Archiving the RCom list

2016-03-04 Thread Dario Taraborelli
The rcom-l <https://lists.wikimedia.org/mailman/listinfo/rcom-l> mailing
list hasn't seen significant activity in over 2 years and the research
committee itself ceased its activities a long time ago. I would still
receive weekly moderation requests to review spammy submissions.

For these reasons I requested <https://phabricator.wikimedia.org/T128141>
that the mailing list be closed and the archives
<https://lists.wikimedia.org/pipermail/rcom-l/> preserved. Going forward,
please use wiki-research-l for research policy and research
outreach-related discussions.

Thanks,
Dario


*Dario Taraborelli  *Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Wiki Workshop 2016 @ ICWSM: deadline extended to March 3

2016-02-23 Thread Dario Taraborelli
Hi all – heads up that we extended the submission deadline for the Wiki
Workshop at ICWSM '16 to *Wednesday, March 3, 2016*. (The second deadline
remains unchanged: March 11, 2016).

You can check the workshop's website
 for submission instructions or
follow us at @wikiworkshop16  for live
updates.

Looking forward to your contributions.

Dario
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] What Wikimedia Research is up to in the next quarter

2015-12-18 Thread Dario Taraborelli
Hey all,

I’m glad to announce that the Wikimedia Research team’s goals
<https://www.mediawiki.org/wiki/Wikimedia_Research/Goals#January_-_March_2016_.28Q3.29>
for
the next quarter (January - March 2016) are up on wiki.

The Research and Data
<https://www.mediawiki.org/wiki/Wikimedia_Research#Research_and_Data> team
will continue to work with our volunteers and collaborators on revision
scoring as a service <https://meta.wikimedia.org/wiki/R:Revscoring> adding
support for 5 new languages and prototyping new models (including an edit
type classifier
<https://meta.wikimedia.org/wiki/Research:Automated_classification_of_edit_types>).
We will also continue to iterate on the design of article creation
recommendations
<https://meta.wikimedia.org/wiki/Research:Increasing_article_coverage>,
running a dedicated campaign in coordination with existing editathons to
improve the quality of these recommendations. Finally, we will extend a
research project we started in November aimed at understanding the behavior
of Wikipedia readers
<https://meta.wikimedia.org/wiki/Research:Characterizing_Wikipedia_Reader_Behaviour>
, by combining qualitative survey data with behavioral analysis from our
HTTP request logs.

The Design Research
<https://www.mediawiki.org/wiki/Wikimedia_Research#Design_Research> team
will conduct an in-depth study of user needs (particularly readers) on the
ground in February. We will continue to work with other Wikimedia
Engineering teams throughout the quarter to ensure the adoption of
human-centered design principles and pragmatic personas
<https://www.mediawiki.org/wiki/Personas_for_product_development> in our
product development cycle. We’re also excited to start a collaboration
<https://meta.wikimedia.org/wiki/Research:Publicly_available_online_learning_resource_survey>
with
students at the University of Washington to understand what free online
information resources (including, but not limited to, Wikimedia projects)
students use.

I am also glad to report that two papers on link and article
recommendations (the result of a formal collaboration with a team at
Stanford) were accepted for presentation at WSDM '16 and WWW ’16 (preprints
will be made available shortly). An overview on revision scoring as a
service
<http://blog.wikimedia.org/2015/11/30/artificial-intelligence-x-ray-specs/> was
published a few weeks ago on the Wikimedia blog, and got some good media
coverage
<https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service/Media>
.

We're constantly looking for contributors and as usual we welcome feedback
on these projects via the corresponding talk pages on Meta. You can contact
us for any question on IRC via the #wikimedia-research channel and follow
@WikiResearch <https://twitter.com/WikiResearch> on Twitter for the latest
Wikipedia and Wikimedia research updates hot off the press.

Wishing you all happy holidays,

Dario and Abbey on behalf of the team


*Dario Taraborelli  *Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] What Wikimedia Research is up to in the next quarter

2015-12-18 Thread Dario Taraborelli

> On Dec 18, 2015, at 10:13 PM, Gerard Meijssen <gerard.meijs...@gmail.com> 
> wrote:
> 
> Hoi,
> Where does it say what languages are covered

https://meta.wikimedia.org/wiki/Objective_Revision_Evaluation_Service#Support_table
 
<https://meta.wikimedia.org/wiki/Objective_Revision_Evaluation_Service#Support_table>

> and, what languages are planned for support?

https://meta.wikimedia.org/wiki/Research_talk:Revision_scoring_as_a_service#Progress_report:_2015-11-28
 
<https://meta.wikimedia.org/wiki/Research_talk:Revision_scoring_as_a_service#Progress_report:_2015-11-28>

although what gets in production will depend on many factors, such as community 
support to generate labeled data, performance of the model etc.

Dario

> Thanks,
>  GerardM
> 
> On 19 December 2015 at 05:16, Dario Taraborelli <dtarabore...@wikimedia.org 
> <mailto:dtarabore...@wikimedia.org>> wrote:
> Hey all,
> 
> I’m glad to announce that the Wikimedia Research team’s goals 
> <https://www.mediawiki.org/wiki/Wikimedia_Research/Goals#January_-_March_2016_.28Q3.29>
>  for the next quarter (January - March 2016) are up on wiki. 
> 
> The Research and Data 
> <https://www.mediawiki.org/wiki/Wikimedia_Research#Research_and_Data> team 
> will continue to work with our volunteers and collaborators on revision 
> scoring as a service <https://meta.wikimedia.org/wiki/R:Revscoring> adding 
> support for 5 new languages and prototyping new models (including an edit 
> type classifier 
> <https://meta.wikimedia.org/wiki/Research:Automated_classification_of_edit_types>).
>  We will also continue to iterate on the design of article creation 
> recommendations 
> <https://meta.wikimedia.org/wiki/Research:Increasing_article_coverage>, 
> running a dedicated campaign in coordination with existing editathons to 
> improve the quality of these recommendations. Finally, we will extend a 
> research project we started in November aimed at understanding the behavior 
> of Wikipedia readers 
> <https://meta.wikimedia.org/wiki/Research:Characterizing_Wikipedia_Reader_Behaviour>,
>  by combining qualitative survey data with behavioral analysis from our HTTP 
> request logs. 
> 
> The Design Research 
> <https://www.mediawiki.org/wiki/Wikimedia_Research#Design_Research> team will 
> conduct an in-depth study of user needs (particularly readers) on the ground 
> in February. We will continue to work with other Wikimedia Engineering teams 
> throughout the quarter to ensure the adoption of human-centered design 
> principles and pragmatic personas 
> <https://www.mediawiki.org/wiki/Personas_for_product_development> in our 
> product development cycle. We’re also excited to start a collaboration 
> <https://meta.wikimedia.org/wiki/Research:Publicly_available_online_learning_resource_survey>
>  with students at the University of Washington to understand what free online 
> information resources (including, but not limited to, Wikimedia projects) 
> students use.
> 
> I am also glad to report that two papers on link and article recommendations 
> (the result of a formal collaboration with a team at Stanford) were accepted 
> for presentation at WSDM '16 and WWW ’16 (preprints will be made available 
> shortly). An overview on revision scoring as a service 
> <http://blog.wikimedia.org/2015/11/30/artificial-intelligence-x-ray-specs/> 
> was published a few weeks ago on the Wikimedia blog, and got some good media 
> coverage 
> <https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service/Media>.
>  
> 
> We're constantly looking for contributors and as usual we welcome feedback on 
> these projects via the corresponding talk pages on Meta. You can contact us 
> for any question on IRC via the #wikimedia-research channel and follow 
> @WikiResearch <https://twitter.com/WikiResearch> on Twitter for the latest 
> Wikipedia and Wikimedia research updates hot off the press.
> 
> Wishing you all happy holidays,
> 
> Dario and Abbey on behalf of the team
> 
> 
> 
> Dario Taraborelli  Head of Research, Wikimedia Foundation
> wikimediafoundation.org <http://wikimediafoundation.org/> • nitens.org 
> <http://nitens.org/> • @readermeter <http://twitter.com/readermeter> 
> 
> 
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org 
> <mailto:Wiki-research-l@lists.wikimedia.org>
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l 
> <https://lists.wikimedia.org/mailman/listinfo/wiki-research-l>
> 
> 
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Fwd: "Wikipedia as the front matter to all research": A brown bag on scholarly citations in Wikipedia this Friday 12/4 @ 12 PT

2015-12-04 Thread Dario Taraborelli
A reminder that this will be streamed today at 9pm CET / 12pm PST
You can join the conversation via IRC on #wikimedia-office 

Dario

> Begin forwarded message:
> 
> From: Dario Taraborelli <dtarabore...@wikimedia.org>
> 
> Come and join us for a brown bag this Friday December 4 at 12 PT to learn 
> about unique identifiers and scholarly citations in Wikipedia, why they 
> matter and how we can bridge the gap between the Wikimedia, research and 
> librarian communities.
> 
> Wikipedia as the front matter to all research
> 
>   YouTube stream: http://www.youtube.com/watch?v=mB_oexqz8pA 
> <http://www.youtube.com/watch?v=mB_oexqz8pA> 
>   Event information on Meta: 
> https://meta.wikimedia.org/wiki/Wikipedia_as_the_front_matter_to_all_research 
> <https://meta.wikimedia.org/wiki/Wikipedia_as_the_front_matter_to_all_research>
>  
> 
> Measuring citizen engagement with the scholarly literature through Wikipedia 
> citations.
> Geoffrey Bilder, CrossRef
> 
> Wikipedia (in toto) is probably the 5th largest referrer of citations to the 
> scholarly literature. That is, more Wikipedia users click on and follow 
> citations to the scholarly literature *from* Wikipedia domains than from any 
> single scholarly publisher in the world. What does this tell us about general 
> interest in the scholarly literature? What does this tell us about scholarly 
> engagement with  editing Wikipedia articles? The short answer is “we don’t 
> know.”  But we are actively working with Wikimedia to find out.
> 
> Building the sum of all human citations
> Dario Taraborelli, WIkimedia Foundation
> 
> As sourcing and verifiability of online information are threatened 
> <http://www.slideshare.net/dartar/citing-as-a-public-service-building-the-sum-of-all-human-citations>
>  by the explosion of answer engines and the changing habits of web users, 
> Wikimedia has an outstanding opportunity to extract and store source data for 
> any conceivable statement and make it transparently verifiable by its users. 
> In this talk, I’ll present a grassroots effort 
> <https://www.wikidata.org/wiki/Wikidata:WikiProject_Source_MetaData> to 
> create a human-curated, comprehensive repository of all human citations in 
> Wikidata.
> 
> –
> Bonus read: a real-time tracker of scholarly citations added to Wikipedia, 
> built with Raspberry Pi
> http://blog.crossref.org/2015/12/crossref-labs-plays-with-the-raspberry-pi-zero.html
>  
> <http://blog.crossref.org/2015/12/crossref-labs-plays-with-the-raspberry-pi-zero.html>
> 


Dario Taraborelli  Head of Research, Wikimedia Foundation
wikimediafoundation.org <http://wikimediafoundation.org/> • nitens.org 
<http://nitens.org/> • @readermeter <http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] "Wikipedia as the front matter to all research": A brown bag on scholarly citations in Wikipedia this Friday 12/4 @ 12 PT

2015-12-02 Thread Dario Taraborelli
Come and join us for a brown bag this Friday December 4 at 12 PT to learn about 
unique identifiers and scholarly citations in Wikipedia, why they matter and 
how we can bridge the gap between the Wikimedia, research and librarian 
communities.

Wikipedia as the front matter to all research

YouTube stream: http://www.youtube.com/watch?v=mB_oexqz8pA 
<http://www.youtube.com/watch?v=mB_oexqz8pA> 
Event information on Meta: 
https://meta.wikimedia.org/wiki/Wikipedia_as_the_front_matter_to_all_research 
<https://meta.wikimedia.org/wiki/Wikipedia_as_the_front_matter_to_all_research> 

Measuring citizen engagement with the scholarly literature through Wikipedia 
citations.
Geoffrey Bilder, CrossRef

Wikipedia (in toto) is probably the 5th largest referrer of citations to the 
scholarly literature. That is, more Wikipedia users click on and follow 
citations to the scholarly literature *from* Wikipedia domains than from any 
single scholarly publisher in the world. What does this tell us about general 
interest in the scholarly literature? What does this tell us about scholarly 
engagement with  editing Wikipedia articles? The short answer is “we don’t 
know.”  But we are actively working with Wikimedia to find out.

Building the sum of all human citations
Dario Taraborelli, WIkimedia Foundation

As sourcing and verifiability of online information are threatened 
<http://www.slideshare.net/dartar/citing-as-a-public-service-building-the-sum-of-all-human-citations>
 by the explosion of answer engines and the changing habits of web users, 
Wikimedia has an outstanding opportunity to extract and store source data for 
every conceivable statement and make it transparently verifiable by its users. 
In this talk, I’ll present a grassroots effort 
<https://www.wikidata.org/wiki/Wikidata:WikiProject_Source_MetaData> to create 
a human-curated, comprehensive repository of all human citations in Wikidata.

–
Bonus read: a real-time tracker of scholarly citations added to Wikipedia, 
built with Raspberry Pi
http://blog.crossref.org/2015/12/crossref-labs-plays-with-the-raspberry-pi-zero.html
 
<http://blog.crossref.org/2015/12/crossref-labs-plays-with-the-raspberry-pi-zero.html>





Dario Taraborelli  Head of Research, Wikimedia Foundation
wikimediafoundation.org <http://wikimediafoundation.org/> • nitens.org 
<http://nitens.org/> • @readermeter <http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Pageview API

2015-11-18 Thread Dario Taraborelli
-- Forwarded message --
From: Dan Andreescu <dandree...@wikimedia.org <mailto:dandree...@wikimedia.org>>
To: Research into Wikimedia content and communities 
<wiki-research-l@lists.wikimedia.org 
<mailto:wiki-research-l@lists.wikimedia.org>>
Cc: 
Date: Wed, 18 Nov 2015 08:43:10 -0500
Subject: Pageview API

Dear Data Enthusiasts,

In collaboration with the Services team, the analytics team wishes to announce 
a public Pageview API 
<https://wikimedia.org/api/rest_v1/?doc#!/Pageviews_data/get_metrics_pageviews_per_article_project_access_agent_article_granularity_start_end>.
  For an example of what kind of UIs someone could build with it, check out 
this excellent demo <http://analytics.wmflabs.org/demo/pageview-api> (code) 
<https://gist.github.com/marcelrf/49738d14116fd547fe6d#file-article-comparison-html>.

The API can tell you how many times a wiki article or project is viewed over a 
certain period.  You can break that down by views from web crawlers or humans, 
and by desktop, mobile site, or mobile app.  And you can find the 1000 most 
viewed articles 
<https://wikimedia.org/api/rest_v1/metrics/pageviews/top/es.wikipedia/all-access/2015/11/11>
 on any project, on any given day or month that we have data for.  We currently 
have data back through October and we will be able to go back to May 2015 when 
the loading jobs are all done.  For more information, take a look at the user 
docs <https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageview_API>.

After many requests from the community, we were really happy to finally make 
this our top priority and get it done.  Huge thanks to Gabriel, Marko, Petr, 
and Eric from Services, Alexandros and all of Ops really, Henrik for 
maintaining stats.grok, and, of course, the many community members who have 
been so patient with us all this time.

The Research team’s Article Recommender tool <http://recommend.wmflabs.org/> 
already uses the API to rank pages and determine relative importance.  Wiki 
Education Foundation’s dashboard <https://dashboard.wikiedu.org/> is going to 
be using it to count how many times an article has been viewed since a student 
edited it.  And there are other grand plans for this data like “article 
finder”, which will find low-rated articles with a lot of pageviews; this can 
be used by editors looking for high-impact work.  Join the fun, we’re happy to 
help get you started and listen to your ideas.  Also, if you find bugs or want 
to suggest improvements, please create a task in Phabricator and tag it with 
#Analytics-Backlog <https://phabricator.wikimedia.org/tag/analytics-backlog/>.

So what’s next?  We can think of too many directions to go into, for pageview 
data and Wikimedia project data, in general.  We need to work with you to make 
a great plan for the next few quarters.  Please chime in here 
<https://phabricator.wikimedia.org/T112956> with your needs.

Team Analytics

(p.s. this was also posted on analytics-l, wikitech-l, and engineering-l, but I 
suck and forgot to cc the research list.  My apologies.)




Dario Taraborelli  Head of Research, Wikimedia Foundation
wikimediafoundation.org <http://wikimediafoundation.org/> • nitens.org 
<http://nitens.org/> • @readermeter <http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] mobile pageviews

2015-09-29 Thread Dario Taraborelli
Phoebe, for a breakdown by country and platform see also 
https://ewulczyn.shinyapps.io/pageview_forecasting

> On Sep 28, 2015, at 5:14 PM, phoebe ayers <phoebe.w...@gmail.com> wrote:
> 
> Excellent, thanks to all three of you, the vital signs & daily
> pageviews graph are just right.
> 
> I still have dreams of a central repository of beautiful data slides
> for talks, updated every so often with current numbers :)
> 
> Phoebe
> 
> p.s. 40% on mobile? holy moly.
> 
> On Mon, Sep 28, 2015 at 7:38 PM, Tilman Bayer <tba...@wikimedia.org> wrote:
>> Thanks Jonathan! Phoebe, you can also find the chart from that email
>> in an updated version at [1], and the dashboard at [2] presents the
>> same data in different form (starting from May instead of April). Both
>> are using the new pageview definition [3] and exclude spider/bot
>> views, whereas the Wikistats/report card charts that Pine mentioned
>> still use the old definition and include non-human views. The latter
>> may be revamped or decommissioned fairly soon.[4]
>> 
>> [1] 
>> https://commons.wikimedia.org/wiki/File:Wikimedia_daily_pageviews,_all_vs._mobile_(April_2015-).png
>> [2] https://vital-signs.wmflabs.org/#projects=all/metrics=Pageviews
>> (click "data breakdowns" on the left)
>> [3] see e.g. https://meta.wikimedia.org/wiki/Research:Page_view
>> [4] https://phabricator.wikimedia.org/T107175 ,
>> https://www.mediawiki.org/wiki/Analytics/Wikistats/TrafficReports/Future_per_report_B2
>> 
>> On Mon, Sep 28, 2015 at 10:49 AM, Jonathan Morgan <jmor...@wikimedia.org> 
>> wrote:
>>> Hi Phoebe,
>>> 
>>> I just forwarded you this email from Mobile-l:
>>> https://lists.wikimedia.org/pipermail/mobile-l/2015-September/009773.html
>>> (since I'm not sure the attached images were archived).
>>> 
>>> I think that might be what you want. If not, Tilman can probably point you
>>> to other, related resources. Hope that helps!
>>> 
>>> J
>>> 
>>> On Mon, Sep 28, 2015 at 10:42 AM, phoebe ayers <phoebe.w...@gmail.com>
>>> wrote:
>>>> 
>>>> Hi Research community (and especially Wikimedia analytics),
>>>> 
>>>> Are there any up-to-date & relatively pretty visualizations of the
>>>> current mobile pageview data --eg a comparison chart between desktop &
>>>> mobile for global traffic for Wikipedia and/or all projects?
>>>> (Stats.wikimedia.org just has desktop, afaik). I know Oliver & Toby
>>>> presented such a thing in May 2014, but I don't know if there's a
>>>> current version.
>>>> 
>>>> Thanks in advance! I am trying to put a presentation together, looking
>>>> for the latest numbers and ideally a graph I can use.
>>>> 
>>>> Phoebe
>>>> 
>>>> --
>>>> * I use this address for lists; send personal messages to phoebe.ayers
>>>>  gmail.com *
>>>> 
>>>> ___
>>>> Wiki-research-l mailing list
>>>> Wiki-research-l@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>> 
>>> 
>>> 
>>> 
>>> --
>>> Jonathan T. Morgan
>>> Senior Design Researcher
>>> Wikimedia Foundation
>>> User:Jmorgan (WMF)
>>> 
>>> 
>>> ___
>>> Wiki-research-l mailing list
>>> Wiki-research-l@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>> 
>> 
>> 
>> 
>> --
>> Tilman Bayer
>> Senior Analyst
>> Wikimedia Foundation
>> IRC (Freenode): HaeB
>> 
>> ___
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> 
> 
> 
> -- 
> * I use this address for lists; send personal messages to phoebe.ayers
>  gmail.com *
> 
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



Dario Taraborelli  Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Fwd: [Wikidata] SrepHit IEG proposal: call for support (was Re: [ANNOUNCEMENT] first StrepHit dataset for the primary sources tool)

2015-09-21 Thread Dario Taraborelli
cross-posting as this might be of interest to people on this list

> Begin forwarded message:
> 
> From: Marco Fossati <hell.j@gmail.com>
> Subject: [Wikidata] SrepHit IEG proposal: call for support (was Re: 
> [ANNOUNCEMENT] first StrepHit dataset for the primary sources tool)
> Date: September 21, 2015 at 3:32:25 AM PDT
> To: wikid...@lists.wikimedia.org
> Reply-To: "Discussion list for the Wikidata project." 
> <wikid...@lists.wikimedia.org>
> 
> Dear all,
> 
> The StrepHit IEG proposal is now pretty much complete:
> https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References
> 
> We have already received support and feedback, but you are the most relevant 
> community and the project needs your specific help.
> 
> Your voice is vital and it can be heard on the project page in multiple ways. 
> If you:
> 1. like the idea, please click on the *endorse* blue button;
> 2. want to get involved, please click on the *join* blue button;
> 3. share your thoughts, please click on the *give feedback* link.
> 
> Looking forward to your updates.
> Cheers!
> 
> On 9/9/15 11:39, Marco Fossati wrote:
>> Hi Markus, everyone,
>> 
>> The project proposal is currently in active development.
>> I would like to focus now on the dissemination of the idea and the
>> engagement of the Wikidata community.
>> Hence, I would love to gather feedback on the following question:
>> 
>> Does StrepHit sounds interesting and useful for you?
>> 
>> It would be great if you could report your thoughts on the project talk
>> page:
>> https://meta.wikimedia.org/wiki/Grants_talk:IEG/StrepHit:_Wikidata_Statements_Validation_via_References
>> 
>> 
>> Cheers!
>> 
>> On 9/8/15 2:02 PM, wikidata-requ...@lists.wikimedia.org wrote:
>>> Date: Mon, 07 Sep 2015 16:47:16 +0200
>>> From: Markus Krötzsch<mar...@semantic-mediawiki.org>
>>> To: "Discussion list for the Wikidata project."
>>><wikid...@lists.wikimedia.org>
>>> Subject: Re: [Wikidata] [ANNOUNCEMENT] first StrepHit dataset for the
>>>primary sources tool
>>> Message-ID:<55eda374.2090...@semantic-mediawiki.org>
>>> Content-Type: text/plain; charset=utf-8; format=flowed
>>> 
>>> Dear Marco,
>>> 
>>> Sounds interesting, but the project page still has a lot of gaps. Will
>>> you notify us again when you are done? It is a bit tricky to endorse a
>>> proposal that is not finished yet;-)
>>> 
>>> Markus
>>> 
>>> On 04.09.2015 17:01, Marco Fossati wrote:
>>>> >[Begging pardon if you have already read this in the Wikidata
>>>> project chat]
>>>> >
>>>> >Hi everyone,
>>>> >
>>>> >As Wikidatans, we all know how much data quality matters.
>>>> >We all know what high quality stands for: statements need to be
>>>> >validated via references to external, non-wiki, sources.
>>>> >
>>>> >That's why the primary sources tool is being developed:
>>>> >https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool
>>>> >And that's why I am preparing the StrepHit IEG proposal:
>>>> >https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References
>>>> 
>>>> >
>>>> >
>>>> >StrepHit (pronounced "strep hit", means "Statement? repherence it!") is
>>>> >a Natural Language Processing pipeline that understands human language,
>>>> >extracts structured data from raw text and produces Wikidata statements
>>>> >with reference URLs.
>>>> >
>>>> >As a demonstration to support the IEG proposal, you can find the
>>>> >**FBK-strephit-soccer** dataset uploaded to the primary sources tool
>>>> >backend.
>>>> >It's a small dataset serving the soccer domain use case.
>>>> >Please follow the instructions on the project page to activate it and
>>>> >start playing with the data.
>>>> >
>>>> >What is the biggest difference that sets StrepHit datasets apart from
>>>> >the currently uploaded ones?
>>>> >At least one reference URL is always guaranteed for each statement.
>>>> >This means that if StrepHit finds some new statement that was not there
>>>> >in Wikidata before, it will always propose its external references.
>>>> >We do not want to manually reject all the new statements with no
>>>> >reference, right?
>>>> >
>>>> >If you like the idea, please endorse the StrepHit IEG proposal!
>> 
> 
> -- 
> Marco Fossati
> http://about.me/marco.fossati
> Twitter: @hjfocs
> Skype: hell_j
> 
> ___
> Wikidata mailing list
> wikid...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata



Dario Taraborelli  Head of Research, Wikimedia Foundation
wikimediafoundation.org <http://wikimediafoundation.org/> • nitens.org 
<http://nitens.org/> • @readermeter <http://twitter.com/readermeter>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Fwd: [Wikitech-l] API BREAKING CHANGE: Default continuation mode for action=query will change at the end of this month

2015-06-03 Thread Dario Taraborelli
Many people on these lists design and use tools that depend on action=query 
(beyond bots). If you do, please read the following:

 Begin forwarded message:
 
 From: Brad Jorsch (Anomie) bjor...@wikimedia.org
 Subject: [Wikitech-l] API BREAKING CHANGE: Default continuation mode for 
 action=query will change at the end of this month
 Date: June 2, 2015 at 10:42:47 PM GMT+2
 To: Wikimedia developers wikitec...@lists.wikimedia.org, 
 mediawiki-api-annou...@lists.wikimedia.org
 Reply-To: Wikimedia developers wikitec...@lists.wikimedia.org
 
 As has been announced several times (most recently at
 https://lists.wikimedia.org/pipermail/wikitech-l/2015-April/081559.html),
 the default continuation mode for action=query requests to api.php will be
 changing to be easier for new coders to use correctly.
 
 *The date is now set:* we intend to merge the change to ride the deployment
 train at the end of June. That should be 1.26wmf12, to be deployed to test
 wikis on June 30, non-Wikipedias on July 1, and Wikipedias on July 2.
 
 If your bot or script is receiving the warning about this upcoming change
 (as seen here
 https://www.mediawiki.org/w/api.php?action=querylist=allpages, for
 example), it's time to fix your code!
 
   - The simple solution is to simply include the rawcontinue parameter
   with your request to continue receiving the raw continuation data (
   example
   
 https://www.mediawiki.org/w/api.php?action=querylist=allpagesrawcontinue=1).
   No other code changes should be necessary.
   - Or you could update your code to use the simplified continuation
   documented at https://www.mediawiki.org/wiki/API:Query#Continuing_queries
   (example
   https://www.mediawiki.org/w/api.php?action=querylist=allpagescontinue=),
   which is much easier for clients to implement correctly.
 
 Either of the above solutions may be tested immediately, you'll know it
 works because you stop seeing the warning.
 
 I've compiled a list of bots that have hit the deprecation warning more
 than 1 times over the course of the week May 23–29. If you are
 responsible for any of these bots, please fix them. If you know who is,
 please make sure they've seen this notification. Thanks.
 
 AAlertBot
 AboHeidiBot
 AbshirBot
 Acebot
 Ameenbot
 ArnauBot
 Beau.bot
 Begemot-Bot
 BeneBot*
 BeriBot
 BOT-Superzerocool
 CalakBot
 CamelBot
 CandalBot
 CategorizationBot
 CatWatchBot
 ClueBot_III
 ClueBot_NG
 CobainBot
 CorenSearchBot
 Cyberbot_I
 Cyberbot_II
 DanmicholoBot
 DeltaQuadBot
 Dexbot
 Dibot
 EdinBot
 ElphiBot
 ErfgoedBot
 Faebot
 Fatemibot
 FawikiPatroller
 HAL
 HasteurBot
 HerculeBot
 Hexabot
 HRoestBot
 IluvatarBot
 Invadibot
 Irclogbot
 Irfan-bot
 Jimmy-abot
 JYBot
 Krdbot
 Legobot
 Lowercase_sigmabot_III
 MahdiBot
 MalarzBOT
 MastiBot
 Merge_bot
 NaggoBot
 NasirkhanBot
 NirvanaBot
 Obaid-bot
 PatruBOT
 PBot
 Phe-bot
 Rezabot
 RMCD_bot
 Shuaib-bot
 SineBot
 SteinsplitterBot
 SvickBOT
 TaxonBot
 Theo's_Little_Bot
 W2Bot
 WLE-SpainBot
 Xqbot
 YaCBot
 ZedlikBot
 ZkBot
 
 
 -- 
 Brad Jorsch (Anomie)
 Software Engineer
 Wikimedia Foundation
 ___
 Wikitech-l mailing list
 wikitec...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Research Data quarterly review

2015-04-17 Thread Dario Taraborelli
An overview of what the Wikimedia Foundation’s Research  Data team has been up 
to, this past quarter (fiscal Q3, 2014-15):
https://commons.wikimedia.org/wiki/File:Analytics_Quarterly_Review_Q3_2014-15_(Research_and_Data).pdf

Dario
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] April 2015 research showcase: remix and reuse in collaborative communities; the oral citations debate

2015-04-16 Thread Dario Taraborelli
I am thrilled to announce our speaker lineup for this month’s research showcase 
https://www.mediawiki.org/wiki/Analytics/Research_and_Data/Showcase#April_2015.
  

Jeff Nickerson (Stevens Institute of Technology) will talk about remix and 
reuse in collaborative communities; Heather Ford (Oxford Internet Institute) 
will present an overview of the oral citations debate in the English Wikipedia.

The showcase will be recorded and publicly streamed at 11.30 PT on Thursday, 
April 30 (livestream link will follow). We’ll hold a discussion and take 
questions from remote attendees via the Wikimedia Research IRC channel 
(#wikimedia-research http://webchat.freenode.net/?channels=wikimedia-research 
on freenode) as usual.

Looking forward to seeing you there.

Dario


Creating, remixing, and planning in open online communities
Jeff Nickerson
Paradoxically, users in remixing communities don’t remix very much. But an 
analysis of one remix community, Thingiverse, shows that those who actively 
remix end up producing work that is in turn more likely to remixed. What does 
this suggest about Wikipedia editing? Wikipedia allows more types of 
contribution, because creating and editing pages are done in a planning 
context: plans are discussed on particular loci, including project talk pages. 
Plans on project talk pages lead to both creation and editing; some editors 
specialize in making article changes and others, who tend to have more 
experience, focus on planning rather than acting. Contributions can happen at 
the level of the article and also at a series of meta levels. Some patterns of 
behavior – with respect to creating versus editing and acting versus planning – 
are likely to lead to more sustained engagement and to higher quality work. 
Experiments are proposed to test these conjectures.
Authority, power and culture on Wikipedia: The oral citations debate
Heather Ford
In 2011, Wikimedia Foundation Advisory Board member, Achal Prabhala was funded 
by the WMF to run a project called 'People are knowledge' or the Oral citations 
project https://meta.wikimedia.org/wiki/Research:Oral_Citations. The goal of 
the project was to respond to the dearth of published material about topics of 
relevance to communities in the developing world and, although the majority of 
articles in languages other than English remain intact, the English editions of 
these articles have had their oral citations removed. I ask why this happened, 
what the policy implications are for oral citations generally, and what steps 
can be taken in the future to respond to the problem that this project (and 
more recent versions of it 
https://meta.wikimedia.org/wiki/Research:Indigenous_Knowledge) set out to 
solve. This talk comes out of an ethnographic project in which I have 
interviewed some of the actors involved in the original oral citations project, 
including the majority of editors of the surr 
https://en.wikipedia.org/wiki/surr article that I trace in a chapter of my 
PhD[1] http://www.oii.ox.ac.uk/people/?id=286.

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Release]

2015-03-03 Thread Dario Taraborelli
yay, shiny! The map is a pretty compelling way to show how dominant traffic 
from the US is, even for very minor languages (say bi.wikipedia.org), I wonder 
how many requests from US-based bots/automata we’re still failing to detect.

 On Mar 3, 2015, at 9:29 PM, Oliver Keyes oke...@wikimedia.org wrote:
 
 Update: the original Shiny instance went down due to server load soon
 after release. It's now up again at http://datavis.wmflabs.org/where/
 on a dedicated Labs machine, where we hope to put...many more
 visualisations. It also now has mapping, largely thanks to Sarah
 Laplante (http://sarahlaplante.com/), and soon it will hopefully be
 /non-hideous/ mapping (the current mass of blue and grey is because my
 aesthetic tastes are...I don't actually have any aesthetic tastes)
 
 On 2 March 2015 at 22:36, Oliver Keyes oke...@wikimedia.org wrote:
 Indeed! Orienting it that way (pivoting on language rather than
 project) is something several people have asked for; I plan to spend a
 chunk of my spare time (that is, recreational time) trying to make it
 work. Should be fairly trivial.
 
 On 2 March 2015 at 09:55, h hant...@gmail.com wrote:
 Hello Finn,
   I do not have a specific answer to your question. However, it might be
 worthwhile to add Finnish in to the comparison as according to the CLDR 26
 T-L information
 http://www.unicode.org/cldr/charts/26/supplemental/territory_language_information.html
 
   You have some sizable Finnish language speakers in Sweden:
 
 Swedish {O} sv 95.0% 99.0%
 Finnish {OR} fi 2.2%
 
So if the similar query is executed on Finnish language, and the results
 also show some undue proportion of visits from Sweden, then what you
 observed as anomaly is the that unique. We probably need many iterations of
 comparative outcomes and normalization of data (Sweden does have higher
 population).  Also, it might be handy to have some statistics on immigration
 or residence, it is EU. I will not be surprised that for example the  visits
 from Oxford to Wikipedia website have sizable German language requests.
 
I am still a bit bothered by the number 1 in the current dataset. It
 does not feel right since the numbers of 1.4% and 0.6% is a notable
 difference in this regard. Perhaps we need some high precision universal
 percentage number for each territory-language pair. It would be also great
 to do another set of aggregation: i.e. given a territory, which language
 versions of Wikipedia are accessed
 
 Best,
 han-teng liao
 
 2015-03-02 13:54 GMT+01:00 Finn Årup Nielsen f...@imm.dtu.dk:
 
 Hi Oliver,
 
 
 Interesting dataset! I am curious about why the Danish Wikipedia is so
 highly acccessed from Sweden. Could it be an error, e.g., with Telia
 IP-numbers?
 
 In Python:
 
 import pandas as pd
 df =
 pd.read_csv('http://files.figshare.com/1923822/language_pageviews_per_country.tsv',
 sep='\t')
 df.ix[df.project == 'da.wikipedia.org', ['country',
 'pageviews_percentage']].set_index('country') pageviews_percentage
 country
 Austria1
 China  1
 Denmark   61
 Estonia1
 France 1
 Germany2
 Netherlands2
 Norway 1
 Sweden18
 United Kingdom 3
 United States  3
 Other  5
 
 
 MaxMind has some numbers on their own accuracy:
 
 https://www.maxmind.com/en/geoip2-city-database-accuracy
 
 For Denmark 85% is Correctly Resolved, for Sweden only 68%. I wonder if
 this really could bias the result so much.
 
 If the numbers are correct why would the Swedish read the Danish Wikipedia
 so much? Bots? It does not apply the other way around: Only 2% of the
 traffic to Swedish Wikipedia comes from Denmark.
 
 
 
 best regards
 Finn
 
 
 
 On 02/25/2015 10:06 PM, Oliver Keyes wrote:
 
 Hey all!
 
 We've released a highly-aggregated dataset of readership data -
 specifically, data about where, geographically, traffic to each of our
 projects (and all of our projects) comes from. The data can be found
 at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've
 put together an exploration tool for it at
 https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/
 
 Hope it's useful to people!
 
 
 
 --
 Finn Årup Nielsen
 http://people.compute.dtu.dk/faan/
 
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 
 
 
 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation
 
 
 
 -- 
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation
 
 

[Wiki-research-l] Wikipedia aggregate clickstream data released

2015-02-17 Thread Dario Taraborelli
We’re glad to announce the release of an aggregate clickstream dataset 
extracted from English Wikipedia

http://dx.doi.org/10.6084/m9.figshare.1305770 
http://dx.doi.org/10.6084/m9.figshare.1305770

This dataset contains counts of (referer, article) pairs aggregated from the 
HTTP request logs of English Wikipedia. This snapshot captures 22 million 
(referer, article) pairs from a total of 4 billion requests collected during 
the month of January 2015.

This data can be used for various purposes:
• determining the most frequent links people click on for a given 
article
• determining the most common links people followed to an article
• determining how much of the total traffic to an article clicked on a 
link in that article
• generating a Markov chain over English Wikipedia

We created a page on Meta for feedback and discussion about this release: 
https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream 
https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream

Ellery and Dario___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] February 2015 Research Showcase: Global South survey results; data imports in OpenStreetMap

2015-02-11 Thread Dario Taraborelli
I am thrilled to announce our speaker lineup for this month’s research showcase 
https://www.mediawiki.org/wiki/Analytics/Research_and_Data/Showcase#February_2015.
  

Our own Haitham Shammaa will present results from the Global South survey. We 
also invited Stamen’s Alan McConchie, an OpenStreetMap expert, to talk about 
the challenges the OSM community is facing with external data imports.

The showcase will be recorded and publicly streamed at 11.30 PT on Wednesday, 
February 18 (livestream link will follow). We’ll hold a discussion and take 
questions from remote participants via the Wikimedia Research IRC channel 
(#wikimedia-research http://webchat.freenode.net/?channels=wikimedia-research 
on freenode).

Looking forward to seeing you there.

Dario


Global South User Survey 2014
By Haitham Shammaa https://meta.wikimedia.org/wiki/User:HaithamS_(WMF)
Users' trends in the Global South have significantly changed over the past two 
years, and given the increase in interest in Global South communities and their 
activities, we wanted this survey to focus on understanding the statistics and 
needs of our users (both readers, and editors) in the regions listed in the 
WMF's New Global South Strategy 
https://m.mediawiki.org/wiki/File:WMF%27s_New_Global_South_Strategy.pdf. This 
survey aims to provide a better understanding of the specific needs of local 
user communities in the Global South, as well as provide data that supports 
product and program development decision making process.

Ingesting Open Geodata: Observations from OpenStreetMap
By Alan McConchie http://stamen.com/studio/alan
As Wikidata grapples with the challenges of ingesting external data sources 
such as Freebase, what lessons can we learn from other open knowledge projects 
that have had similar experiences? OpenStreetMap, often called The Wikipedia 
of Maps, is a crowdsourced geospatial data project covering the entire world. 
Since the earliest years of the project, OSM has combined user contributions 
with existing data imported from external sources. Within the OSM community, 
these imports have been controversial; some core OSM contributors complain that 
imported data is lower quality than user-contributed data, or that it 
discourages the growth of local mapping communities. In this talk, I'll review 
the history of data imports in OSM, and describe how OSM's best-practices have 
evolved over time in response to these critiques.

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Scholarly citations by PMID/PMCID in Wikipedia

2015-02-02 Thread Dario Taraborelli
Hey all,

we just released a dataset of scholarly citations in the English Wikipedia by 
Pubmed / Pubmed Central ID. 

http://dx.doi.org/10.6084/m9.figshare.1299540

The dataset currently includes the first known occurrence of a PMID or PMCID 
citation in an English Wikipedia article and the associated revision metadata, 
based on the most recent complete content dump of English Wikipedia. We’re 
planning on expanding this dataset to include other types of scholarly 
identifier soon.

Feel free to share this with anyone interested or spread the word via: 
https://twitter.com/WikiResearch/status/562422538613956608

Dario and Aaron
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Wikimedia referrer policy

2015-01-20 Thread Dario Taraborelli
I’ve been discussing with the folks at CrossRef (the largest registry of 
Digital Object Identifiers, think of it as the ICANN of science) how to 
accurately measure the impact of traffic driven from Wikipedia/Wikimedia to 
scholarly resources. 

While digging into their data, we realized that since Wikimedia started the 
HTTPS switchover and an increasing portion of inbound traffic happens over SSL, 
Wikimedia sites may have stopped advertising themselves as sources of referred 
traffic to external sites. While this is a literal implication of HTTPS, it 
means that Wikimedia's impact on traffic directed to other sites is becoming 
largely invisible and Wikimedia might be turning into a large source of dark 
traffic.

I wrote a proposal reviewing the CrossRef use case and discussing how other top 
web properties deal with this issue by adopting a so-called Referrer Policy”: 

https://meta.wikimedia.org/wiki/Research:Wikimedia_referrer_policy 
https://meta.wikimedia.org/wiki/Research:Wikimedia_referrer_policy

Feedback is welcome on the talk page: 

https://meta.wikimedia.org/wiki/Research_talk:Wikimedia_referrer_policy 
https://meta.wikimedia.org/wiki/Research_talk:Wikimedia_referrer_policy

Dario___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Lots of Wikipedia Research at CSCW this year!

2015-01-16 Thread Dario Taraborelli
super-excited about this and glad we can finally make the announcement :)

 On Jan 16, 2015, at 9:29 AM, Andrea Forte andrea.fo...@gmail.com wrote:
 
 
 CSCW (ACM Conference on Computer-Supported Cooperative Work and Social 
 Computing) is just around the corner...
 
 The CSCW conference is one of the major venues for publishing 
 Wikipedia-related work. This year there are at least 7 papers about Wikipedia 
 and a workshop on academic and industry open collaboration research. 
 
 Check out the advance program here: 
 http://confer.csail.mit.edu/cscw2015/schedule 
 http://confer.csail.mit.edu/cscw2015/schedule
 
 Early bird registration ends Jan 30th. Come hang out with Wikiresearch folk 
 in Vancouver in spring. :) 
 
 CSCW 2015 | Vancouver, Canada 
 March 14-18 | http://cscw.acm.org http://cscw.acm.org/
 
 Andrea
 
 
 -- 
  :: Andrea Forte
  :: Assistant Professor
  :: College of Computing and Informatics, Drexel University
  :: http://www.andreaforte.net 
 http://www.andreaforte.net/___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] January 2015 Wikimedia Research Showcase: Felipe Ortega and Benjamin Mako Hill

2015-01-12 Thread Dario Taraborelli
The upcoming Wikimedia Research showcase 
https://www.mediawiki.org/wiki/Analytics/Research_and_Data/Showcase 
(Wednesday January 14, 11.30 PT) will host two guest speakers: Felipe Ortega 
https://en.wikipedia.org/wiki/User:GlimmerPhoenix (University of Madrid) and 
Benjamin Mako Hill https://en.wikipedia.org/wiki/User:Benjamin_Mako_Hill 
(University of Washington). 
As usual, the showcase will be broadcast on YouTube (the livestream link will 
follow on the list) and we’ll host the QA on the #wikimedia-research IRC 
channel on freenode.

We look forward to seeing you there.

Dario


Functional roles and career paths in Wikipedia
By Felipe Ortega https://www.mediawiki.org/wiki/User:GlimmerPhoenix
An understanding of participation dynamics within online production communities 
requires an examination of the roles assumed by participants. Recent studies 
have established that the organizational structure of such communities is not 
flat; rather, participants can take on a variety of well-defined functional 
roles. What is the nature of functional roles? How have they evolved? And how 
do participants assume these functions? Prior studies focused primarily on 
participants' activities, rather than functional roles. Further, extant 
conceptualizations of role transitions in production communities, such as the 
Reader to Leader framework, emphasize a single dimension: organizational power, 
overlooking distinctions between functions. In contrast, in this paper we 
empirically study the nature and structure of functional roles within 
Wikipedia, seeking to validate existing theoretical frameworks. The analysis 
sheds new light on the nature of functional roles, revealing the intricate “ 
areer paths resulting from participants' role transitions.

Free Knowledge Beyond Wikipedia
A conversation facilitated by Benjamin Mako Hill 
https://www.mediawiki.org/wiki/User:Benjamin_Mako_Hill
In some of my research with Leah Buechley 
http://mako.cc/academic/buechley_hill_DIS_10.pdf, I’ve explored the way that 
increasing engagement and diversity in technology communities often means not 
just attacking systematic barriers to participation but also designing for new 
genres and types of engagement. I hope to facilitate a conversation about how 
WMF might engage new readers by supporting more non-encyclopedic production. 
I'd like to call out some examples from the new Wikimedia project proposals 
list https://meta.wikimedia.org/wiki/Proposals_for_new_projects, encourage 
folks to share entirely new ideas, and ask for ideas about how we could 
dramatically better support Wikipedia's sister projects.

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Geo-aggregation of Wikipedia page views: Maximizing geographic granularity while preserving privacy – a proposal

2015-01-12 Thread Dario Taraborelli
I’m sharing a proposal that Reid Priedhorsky and his collaborators at Los 
Alamos National Laboratory recently submitted to the Wikimedia Analytics Team 
aimed at producing privacy-preserving geo-aggregates of Wikipedia pageview data 
dumps and making them available to the public and the research community. [1] 

Reid and his team spearheaded the use of the public Wikipedia pageview dumps to 
monitor and forecast the spread of influenza and other diseases, using language 
as a proxy for location. This proposal describes an aggregation strategy adding 
a geographical dimension to the existing dumps.

Feedback on the proposal is welcome on the lists or the project talk page on 
Meta [3]

Dario

[1] 
https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_pageviews
[2] http://dx.doi.org/10.1371/journal.pcbi.1003892
[3] 
https://meta.wikimedia.org/wiki/Research_talk:Geo-aggregation_of_Wikipedia_pageviews
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Designing a public API for Wikidata

2015-01-07 Thread Dario Taraborelli
In case you missed this, the Wikimedia Services team is soliciting feedback on 
the design of a public API for Wikidata, considering MQL and SPARQL among 
possible options [1]
If you are interested in contributing to this discussion, please chime in on 
the Phabricator thread or help spread the word among interested parties [2].

Dario

[1] https://phabricator.wikimedia.org/T85181
[2] https://twitter.com/ReaderMeter/status/552919940387196928
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Article feedback corpus released

2014-12-24 Thread Dario Taraborelli
I’m glad to announce the release of an open-licensed corpus with 1.5M records 
from the Article Feedback v5 pilot. 

http://dx.doi.org/10.6084/m9.figshare.1277784

Thanks to everyone who helped make this happen, Fabrice in particular for 
shepherding this through.

Dario

—
This dataset contains the entire corpus of feedback submitted on the English, 
French and German Wikipedia during the Article Feedback v.5 pilot (AFT). [1] 
The Wikimedia Foundation ran the Article Feedback pilot for a year between 
March 2013 and March 2014. During the pilot, 1,549,842 feedback messages were 
collected across the three languages.

All feedback messages and their metadata (as described in this schema [2]) are 
available in this dataset, with the exception of messages that have been 
oversighted and/or deleted by the end of the pilot.
The corpus is released [3] under the following license:

• CC BY SA 3.0 for feedback messages
• CC0 for the associated metadata

Results from the pilot are discussed in: Halfaker, A., Keyes, O. and 
Taraborelli, D (2013). Making peripheral participation legitimate: Reader 
engagement experiments in Wikipedia. CSCW ’13 Proceedings of the 2013 
Conference on Computer Supported Cooperative Work [4][5]

[1] https://www.mediawiki.org/wiki/Article_feedback/Version_5
[2] 
https://www.mediawiki.org/wiki/Article_feedback/Version_5/Technical_Design_Schema#aft_feedback
[3] https://wikimediafoundation.org/wiki/Feedback_data#Article_Feedback
[4] http://dx.doi.org/10.1145/2441776.2441872
[5] http://nitens.org/docs/cscw13.pdf___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Wikimedia Research Showcase -- Thurs. Dec 18th: mobile readership; disease monitoring with Wikipedia

2014-12-17 Thread Dario Taraborelli
This month’s Research showcase will be held tomorrow, Thursday, Dec. 18th at 
3PM PST (2300 UTC). As usual, the event will be recorded and publicly streamed 
on YouTube (link https://www.youtube.com/watch?v=xPO8XhmeUAU) We’ll hold a 
discussion and take questions from the Wikimedia Research IRC channel 
(#wikimedia-research http://webchat.freenode.net/?channels=wikimedia-research 
on freenode).

Looking forward to seeing you there.

Dario

——
This month:

Mobile Madness: The Changing Face of Wikimedia Readers
By Oliver Keyes https://www.mediawiki.org/wiki/User:Ironholds
A dive into the data we have around readership that investigates the rising 
popularity of the mobile web, countries and projects that are racing ahead of 
the pack, and what changes in user behaviour we can expect to see as mobile 
grows.

Global Disease Monitoring and Forecasting with Wikipedia
By Reid Priedhorsky 
http://www.lanl.gov/expertise/profiles/view/reid-priedhorsky (Los Alamos 
National Laboratory)
Infectious disease is a leading threat to public health, economic stability, 
and other key social structures. Efforts to mitigate these impacts depend on 
accurate and timely monitoring to measure the risk and progress of disease. 
Traditional, biologically-focused monitoring techniques are accurate but costly 
and slow; in response, new techniques based on social internet data, such as 
social media and search queries, are emerging. These efforts are promising, but 
important challenges in the areas of scientific peer review, breadth of 
diseases and countries, and forecasting hamper their operational usefulness. We 
examine a freely available, open data source for this use: access logs from the 
online encyclopedia Wikipedia. Using linear models, language as a proxy for 
location, and a systematic yet simple article selection procedure, we tested 14 
location-disease combinations and demonstrate that these data feasibly support 
an approach that overcomes these challenges. Specifically, our proof-of-concept 
yields models with r² up to 0.92, forecasting value up to the 28 days tested, 
and several pairs of models similar enough to suggest that transferring models 
from one location to another without re-training is feasible. Based on these 
preliminary results, we close with a research agenda designed to overcome these 
challenges and produce a disease monitoring and forecasting system that is 
significantly more effective, robust, and globally comprehensive than the 
current state of the art.___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Wikimedia-l] Experimenting on Wikimedians

2014-11-03 Thread Dario Taraborelli
Aileen, your autoresponder was flooding the mailing lists. I switched your mail 
to moderated, drop me a line if you wish them to be reinstated after you’re 
back from vacation.

 On Nov 1, 2014, at 5:59 PM, Aileen Oeberst a.oebe...@iwm-kmrc.de wrote:
 
 I am currently on vacation and will not be able to answer your mail before
 November 10. But I will get back then as soon as possible.
 
 Best regards, Aileen Oeberst
 
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Wikimedia-l] Experimenting on Wikimedians

2014-11-03 Thread Dario Taraborelli
please neglect this, reply-to did its job as expected :)

 On Nov 3, 2014, at 10:48 PM, Dario Taraborelli da...@wikimedia.org wrote:
 
 Aileen, your autoresponder was flooding the mailing lists. I switched your 
 mail to moderated, drop me a line if you wish them to be reinstated after 
 you’re back from vacation.
 
 On Nov 1, 2014, at 5:59 PM, Aileen Oeberst a.oebe...@iwm-kmrc.de wrote:
 
 I am currently on vacation and will not be able to answer your mail before
 November 10. But I will get back then as soon as possible.
 
 Best regards, Aileen Oeberst
 
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] 'Wikipedia Network Analysis' by Brian Keegan

2014-10-30 Thread Dario Taraborelli
(shameless plug)

for those of you who are on Twitter, follow @WikiResearch 
https://twitter.com/WikiResearch and you won’t miss any of these 
announcements.

Dario

 On Oct 30, 2014, at 10:21 AM, Maximilian Klein isa...@gmail.com wrote:
 
 IPython notebook FTW. Thanks for sharing.
 
 Make a great day,
 Max Klein ‽ http://notconfusing.com/ http://notconfusing.com/ 
 
 On Tue, Oct 28, 2014 at 10:46 AM, Michael Maggs mich...@maggs.name 
 mailto:mich...@maggs.name wrote:
 (Apologies if this has been referred to already on this list. If so, I missed 
 it).
 
 A couple of Weeks ago, Brian Keegan published a very nice blog post [1] on 
 the use of Python for Wikimedia research. He uses examples from the English 
 Wikipedia but the techniques he describes are applicable more generally.
 
 It’s fascinating, and shows what a lot can be done with a few lines of code.
 
 Michael
 
 
 
 [1]  
 http://nbviewer.ipython.org/github/brianckeegan/Wikipedia-Network-Analysis/blob/master/Wikipedia%20Network%20Analysis.ipynb
  
 http://nbviewer.ipython.org/github/brianckeegan/Wikipedia-Network-Analysis/blob/master/Wikipedia%20Network%20Analysis.ipynb
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org 
 mailto:Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l 
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Wikimedia Research showcase – October 15 2014, 11.30 PT

2014-10-14 Thread Dario Taraborelli
After a break in September, we’re resuming our monthly Research and Data 
showcase. The next showcase will be live-streamed tomorrow Wednesday October 15 
at 11.30 PT. As usual you can join the conversation via IRC on freenode.net by 
joining the #wikimedia-research channel.

We look forward to seeing you there,

Dario


This month:

Emotions under Discussion: Gender, Status and Communication in Wikipedia
By David Laniado: I will present a large-scale analysis of emotional expression 
and communication style of editors in Wikipedia discussions. The talk will 
focus especially on how emotion and dialogue differ depending on the status, 
gender, and the communication network of the about 12000 editors who have 
written at least 100 comments on the English Wikipedia's article talk pages. 
The analysis is based on three different predefined lexicon-based methods for 
quantifying emotions: ANEW, LIWC and SentiStrength. The results unveil 
significant differences in the emotional expression and communication style of 
editors according to their status and gender, and can help to address issues 
such as gender gap and editor stagnation.

Wikipedia as a socio-technical system
By Aaron Halfaker: Wikipedia is a socio-technical system. In this presentation, 
I'll explain how the integration of human collective behavior (social) and 
information technology (technical) has lead to a phenomena that, while being 
massively productive, is poorly understood due to lack of precedence. Based on 
my work in this area, I'll describe five critical functions that healthy, 
Wikipedia-like socio-technical systems must serve in order to continue to 
function: allocation, regulation, quality control, community management and 
reflection. Next I'll argue the Wikimedia Foundation's analytics strategy 
currently focuses on outcomes related to a relatively narrow aspect of system 
health and all but completely ignores productivity. Finally, I'll conclude with 
an overview of three classes of new projects that should provide critical 
opportunities to both practically and academically understand the maintenance 
of Wikipedia's socio-technical fitness.

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Fwd: [Wikidata-l] Venue for Wikidata research

2014-10-01 Thread Dario Taraborelli
Forwarding from wikidata-l. 

Begin forwarded message:

 From: Markus Krötzsch mar...@semantic-mediawiki.org
 Date: October 1, 2014 at 03:37:40 PDT
 To: Discussion list for the Wikidata project. 
 wikidat...@lists.wikimedia.org
 Subject: [Wikidata-l] Venue for Wikidata research
 Reply-To: Discussion list for the Wikidata project. 
 wikidat...@lists.wikimedia.org
 
 Dear all:
 
 Those of you active in research may be interested in submitting to a recently 
 announced special issue of the Journal of Web Semantics that explicitly 
 refers to Wikidata in its call:
 
 JWS Special Issue on Knowledge Graphs
 http://www.websemanticsjournal.org/index.php/ps/announcement/view/19
 
 I am a guest editor for this issue and I would obviously be delighted to see 
 some Wikidata-related works, but also other research on large, heterogeneous, 
 graph-like knowledge collections is welcome. JWS is a top journal on Web data 
 and semantic technologies, so we are looking for high-quality research here.
 
 Please spread the word as appropriate.
 
 Cheers,
 
 Markus
 
 P.S. Note that JWS provides Open Access options for those who want to ensure 
 that their work is freely licensed. However, JWS also has a tradition of 
 keeping all of its preprints freely accessible via its preprint archive; see 
 http://www.websemanticsjournal.org/index.php/ps/issue/archive
 
 ___
 Wikidata-l mailing list
 wikidat...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Monthly Research Data Showcase this Wednesday

2014-06-16 Thread Dario Taraborelli
The next Research  Data showcase will be live-streamed this Wednesday 6/18 at 
11.30 PT.

The streaming link will be posted on the lists a few minutes before the 
showcase starts and as usual you can join the conversation on IRC at 
#wikimedia-research.

We look forward to seeing you!

Dario

This month:

MoodBar -- lightweight socialization improves long-term editor retention
by Giovanni Luca Ciampaglia -- I will talk about MoodBar, an experimental 
feature deployed on the English Wikipedia from 2011 to 2013 to streamline the 
socialization of newcomers. I will present results from a natural experiment 
that measured the effect of Moodbar on the short-term engagement and long-term 
retention of newly registered users attempting to edit for the first time 
Wikipedia. Our results indicate that a mechanism to elicit lightweight feedback 
and to provide early mentoring to newcomers significantly improves their 
chances of becoming long-term contributors.
Active Editors' Survival Models
by Leila Zia -- I will talk about first results in building prediction models 
for active editors' survival. A sample of such prediction models, their 
performance, and the important variables in predicting survival will be 
presented.


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Upcoming research newsletter (May 2014): new papers open for review

2014-05-27 Thread Dario Taraborelli
Hi everybody,

we’re preparing for the May 2014 research newsletter and looking for 
contributors. Please take a look at: https://etherpad.wikimedia.org/p/WRN201405 
and add your name next to any paper you are interested in covering. As usual, 
short notes and one-paragraph reviews are most welcome.

Highlights from this month:

• Detecting epidemics using Wikipedia article views: A demonstration of 
feasibility with language as location proxy
• Wikipedia in the eyes of its beholders: A systematic review of scholarly 
research on Wikipedia readers and readership 
• The sum of all human knowledge: a systematic review of scholarly research 
on the content of Wikipedia
• Uneven Openness: Barriers to MENA Representation on Wikipedia
• Sex ratios in Wikidata
• Automatically Detecting Corresponding Edit-Turn-Pairs in Wikipedia 
• A Novel Methodology Based on Formal Methods for Analysis and Verification of 
Wikis
• Okinawa in Japanese and English Wikipedia
• Bipartite Editing Prediction in Wikipedia
• Increasing the Discoverability of Digital Collections Using Wikipedia:  The 
Pitt Experience
• Playscript Classification and Automatic Wikipedia Play Articles Generation

If you have any question about the format or process feel free to get in touch 
off-list.

Dario Taraborelli and Tilman Bayer

[1] http://meta.wikimedia.org/wiki/Research:Newsletter
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Monthly research data showcase livestreamed today

2014-05-21 Thread Dario Taraborelli
The next Research  Data showcase will be live-streamed today Wed 5/21 at 11.30 
PT.

The streaming link will be posted on the lists a few minutes before the 
showcase starts and as usual you can join the conversation on IRC at 
#wikimedia-research.

We look forward to seeing you!

Dario


This month:

UX research at WMF
Introducing Abbey Ripstra, the new UX research lead at the Wikimedia Foundation.
A bird's eye view of editor activation
by Dario Taraborelli -- In this talk I will give a high-level overview of data 
on new editor activation, presenting longitudinal data from the largest 
Wikipedias, a comparison between desktop and mobile registrations and the 
relative activation rates of different cohorts of newbies.
Collaboration patterns in Articles for Creation
by Aaron Halfaker -- Wikipedia needs to attract and retain newcomers while also 
increasing the quality of its content. Yet new Wikipedia users are 
disproportionately affected by the quality assurance mechanisms designed to 
thwart spammers and promoters. English Wikipedia’s en:WP:Articles for Creation 
provides a protected space for newcomers to draft articles, which are reviewed 
against minimum quality guidelines before they are published. In this 
presentation, describe and a study of how this drafting process has affected 
the productivity of newcomers in Wikipedia. Using a mixed qualitative and 
quantitative approach, I'll show the process's pre-publication review, which is 
intended to improve the success of newcomers, in fact decreases newcomer 
productivity in English Wikipedia and offer recommendations for system 
designers.

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Monthly research data showcase livestreamed today

2014-05-21 Thread Dario Taraborelli
The livestream link is http://youtu.be/AUupsnvV1oA


On Wed, May 21, 2014 at 7:50 AM, Dario Taraborelli 
dtarabore...@wikimedia.org wrote:

 The next Research  Data 
 showcasehttps://www.mediawiki.org/wiki/Analytics/Research_and_Data/Showcase 
 will
 be live-streamed *today Wed 5/21 at 11.30 PT*.

 The streaming link will be posted on the lists a few minutes before the
 showcase starts and as usual you can join the conversation on IRC at
 #wikimedia-research.

 We look forward to seeing you!

 Dario


 This month:

 UX research at WMFIntroducing Abbey Ripstra, the new UX research lead at
 the Wikimedia Foundation.
 A bird's eye view of editor activation
 *by Dario Taraborelli https://en.wikipedia.org/wiki/User:DarTar* -- In
 this talk I will give a high-level overview of data on new editor
 activation https://meta.wikimedia.org/wiki/R:Editor_activation,
 presenting longitudinal data from the largest Wikipedias, a comparison
 between desktop and mobile registrations and the relative activation rates
 of different cohorts of newbies.
 Collaboration patterns in Articles for Creation*by Aaron Halfaker
 https://www.mediawiki.org/wiki/User:Halfak_(WMF)* -- Wikipedia needs to
 attract and retain newcomers while also increasing the quality of its
 content. Yet new Wikipedia users are disproportionately affected by the
 quality assurance mechanisms designed to thwart spammers and promoters.
 English Wikipedia’s en:WP:Articles for 
 Creationhttps://en.wikipedia.org/wiki/WP:Articles_for_Creation provides
 a protected space for newcomers to draft articles, which are reviewed
 against minimum quality guidelines before they are published. In this
 presentation, describe and a study of how this drafting process has
 affected the productivity of newcomers in Wikipedia. Using a mixed
 qualitative and quantitative approach, I'll show the process's
 pre-publication review, which is intended to improve the success of
 newcomers, in fact decreases newcomer productivity in English Wikipedia and
 offer recommendations for system designers.




-- 
Dario Taraborelli
Wikimedia Foundation

http://wikimediafoundation.org
http://nitens.org/taraborelli
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Fwd: Reminder: Code as a Research Object office hours - today, 11 am + 4 pm ET

2014-04-24 Thread Dario Taraborelli
Cross-posting from Kaitlin Thaney (Mozilla Science Lab)

Begin forwarded message:

 From: Kaitlin Thaney kait...@mozillafoundation.org
 Subject: Reminder: Code as a Research Object office hours - today, 11 am + 
 4 pm ET
 Date: April 24, 2014 at 7:44:43 AM PDT
 
 Hi all - 
 
 Looking to learn more about how to archive your code and assign it a Digital 
 Object Identifier? Curious about how we linked GitHub (a code hosting 
 service) and figshare (an open data repository) - and how you can do the 
 same? 
 
 We'll be running two open office hour sessions today (to be kinder to those 
 in other timezones) - one at 11 am ET, the other at 4pm ET. Our collaborators 
 will be walking participants through the technical build of the project. 
 
 Call in details here in the etherpad: 
 https://etherpad.mozilla.org/sciencelab-coderesobject-officehours
 
 And more on the project can be found here: 
 http://mozillascience.org/code-as-a-research-object-updates-prototypes-next-steps/
 
 All the best,
 K
 --
 Kaitlin Thaney
 Director, Mozilla Science Lab
 @kaythaney ; @MozillaScience
 skype / IRC: kaythaney
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Wikimedia Analytics @ Wikimania 2014

2014-03-31 Thread Dario Taraborelli
A list of Wikimania proposals submitted or co-authored by the Analytics team:

https://www.mediawiki.org/wiki/Analytics/Wikimania_2014

Dario

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] CSCW 2015: Call for participation

2014-03-26 Thread Dario Taraborelli
CSCW is a major venue for open collaboration research. Every year the 
conference features some of the best studies of collaborative systems 
(including wiki and Wikipedia research [1 -3]). If you are active in this 
space, please consider submitting a paper or poster, hosting a workshop or 
organizing a panel at the forthcoming conference in Vancouver. The call for 
participation with the relevant deadlines is below.

Dario

[1] https://meta.wikimedia.org/wiki/Research:Newsletter/2014/February
[2] https://meta.wikimedia.org/wiki/Research:Newsletter/2013/March
[3] https://meta.wikimedia.org/wiki/Research:Newsletter/2012/February


—
CSCW 2015 | Call for Participation
March 14-18, 2015 | Vancouver, BC, Canada
http://cscw.acm.org

The ACM conference on Computer-Supported Cooperative Work and Social Computing 
is the premier venue for research in the design and use of technologies that 
affect groups, organizations, communities, and networks. Bringing together top 
researchers and practitioners from academia and industry who are interested in 
the area of social computing, CSCW addresses both the technical and social 
challenges encountered when supporting collaboration. The development and 
application of new technologies continues to enable new ways of working 
together and coordinating activities.

The conference offers several types of submissions with the following deadlines.

Papers: June 4, 2014
Workshops proposals: August 8, 2014
Interactive Posters: November 10, 2014
Panels: November 10, 2014
Doctoral Colloquium: November 10, 2014
Demonstrations: December 12, 2014

See the individual calls at http://cscw.acm.org/2015/submit/ for more details.

The scope of CSCW spans socio-technical domains including work, home, 
education, healthcare, the arts, leisure, and entertainment. The conference 
seeks novel research results or new ways of thinking about, studying, or 
supporting shared activities in these and related areas:

▪ Social and crowd computing. Studies, theories, designs, mechanisms, systems, 
and/or infrastructures addressing social media, social networking, wikis, 
blogs, online gaming, crowdsourcing, collective intelligence, virtual worlds or 
collaborative information behaviors.
▪ System Design. Hardware, architectures, infrastructures, interaction design, 
technical foundations, algorithms, and/or toolkits that enable the building of 
new social and collaborative systems and experiences.
▪ Theories. Critical analysis or theory with clear relevance to the design or 
study of social and collaborative systems.
▪ Empirical investigations. Findings, guidelines, and/or studies related to 
communication, collaboration, and social technologies, practices, or use. CSCW 
welcomes diverse methods and approaches.
▪ Mining and Modeling. Studies, analyses and infrastructures for making use of 
large- and small-scale data.
▪ Methodologies and tools. Novel methods or combinations of approaches and 
tools used in building systems or studying their use.
▪ Domain-specific social and collaborative applications. Including applications 
to healthcare, transportation, gaming, ICT4D, sustainability, education, 
accessibility, global collaboration, or other domains.
▪ Collaboration systems based on emerging technologies. Mobile and ubiquitous 
computing, game engines, virtual worlds, multi-touch, novel display 
technologies, vision and gesture recognition, big data, MOOCs, crowd labor 
markets, SNSs, or sensing systems.
▪ Crossing boundaries. Studies, prototypes, or other investigations that 
explore interactions across disciplines, distance, languages, generations, and 
cultures, to help better understand how to transcend social, temporal, and/or 
spatial boundaries.

General Co-Chairs
Andrea Forte, Drexel University
Dan Cosley, Cornell University
chairs2...@cscw.acm.org

Program Co-Chairs
Luigina Ciolfi, Sheffield Hallam University
David McDonald, University of Washington
papers2...@cscw.acm.org

Posters Co-Chairs
Karyn Moffatt, McGill University
Aleksandra Sarcevic, Drexel University
posters2...@cscw.acm.org

Panels Co-Chairs
Louise Barkhuus, Stockholm University
Anatoliy Gruzd, Dalhousie University
panels2...@cscw.acm.org

Workshops Co-Chairs
Laura Dabbish, Carnegie Mellon University
Jenn Thom, Amazon
workshops2...@cscw.acm.org

Demos Co-Chairs
Tomoo Inoue, University of Tsukuba
Tony Tang, University of Calgary
demos2...@cscw.acm.org

Doctoral Consortium Co-Chairs
Carl Gutwin, University of Saskatchewan
Abigail Sellen, Microsoft Research Cambridge
dc2...@cscw.acm.org
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Wikimedia monthly research data showcase: live streamed tomorrow

2014-03-19 Thread Dario Taraborelli
The next Research  Data showcase will be live-streamed tomorrow Wed 3/19 at 
11.30 PT.

The streaming link will be posted on the lists a few minutes before the 
showcase starts and you can join the conversation on IRC at 
#wikimedia-research. We look forward to seeing you!

Dario

Metrics standardization (Dario Taraborelli)
In this talk I'll present the most recent updates on our work on participation 
metrics and discuss the goals of the Editor Engagement Vital Signs project.

Wikipedia's rise and decline (Aaron Halfaker)
In Halfaker et al. (2013) we present data that show that several changes the 
Wikipedia community made to manage quality and consistency in the face of a 
massive growth in participation have ironically crippled the very growth they 
were designed to manage. Specifically, the restrictiveness of the 
encyclopedia's primary quality control mechanism and the algorithmic tools used 
to reject contributions are implicated as key causes of decreased newcomer 
retention.___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] identifying Wikipedia article topics

2014-03-18 Thread Dario Taraborelli
if you’re not interested in actual topic extraction a good heuristic to 
identify high-level topic areas is to rely on Wikiprojects on the English 
Wikipedia and then use language links from Wikidata to apply them to other 
languages. That won’t immediately cover articles that only exist in one 
language, but it’s the most effective heuristic I can think of for your use 
case.

Dario

On Mar 17, 2014, at 8:21 AM, Amir E. Aharoni amir.ahar...@mail.huji.ac.il 
wrote:

 Hallo,
 
 Is there any known easy way to classify Wikipedia articles into a relatively 
 small number of types?
 
 By relatively small I mean no more than twenty, and by types I mean 
 things that are intuitively clear to readers, for example:
 * Biographies
 * Articles about scientific phenomena (can be sub-grouped to math, astronomy, 
 physics, geology, medicine)
 * Articles about works of art (paintings, movies, books, records, statues)
 * Articles about places
 * Articles about historical events
 * Articles about biological species
 * Articles that mostly present data, such as demography or results of 
 competitions (sports, elections, game shows)
 
 There are a few more, but not much. I hope that you get the idea.
 
 We have categories, but I'm not sure that it's easy to use categories for 
 such things because of the very loose category structure. For example, 
 [[Eurovision 2007]] is somewhere under [[Category:Humans]], even though it's 
 not an article about a human.
 
 Such information can be useful for study about the types of articles that 
 different people write. In particular, I thought about it in the context of 
 analyzing the types of articles that people are translating now (manually) 
 and will translate in the future using the ContentTranslation, which is in 
 its early stages of development.
 
 Thanks,
 
 --
 Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
 http://aharoni.wordpress.com
 ‪“We're living in pieces,
 I want to live in peace.” – T. Moore‬
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Notes from the wiki research session at CSCW '14

2014-03-06 Thread Dario Taraborelli
Hi Pine

On Mar 5, 2014, at 11:43 PM, ENWP Pine deyntest...@hotmail.com wrote:

 Does Analytics have any ideas to contribute to how to stabilize and increase 
 the population of active editors and to improve editor gender diversity? 
 There were relevant blog posts at [1] and [2]. I would like to hear how data 
 and analysis of that survey have been used in areas outside of VE development 
 and any other ideas Analytics has about improving population size and gender 
 diversity. 

not to my knowledge, other than VE I cannot think of other areas in which 
survey results about diversity have driven Product design, which today is 
primarily focused on user acquisition and new editor activation experiments. 
You should look outside of Product (notably, Programs  Grantmaking) for 
projects like the Teahouse that are much more geared towards diversity, but I 
am sure you are already familiar with them.

There’s also a related (and less known) project that was piloted a few months 
ago to try and gauge gender gap in specific segments of the editor population 
or editor lifecycle via microsurveys. [1] I’d love to hear from other parties 
interested in using this model, which I think is promising.

shameless plugI am also writing up a proposal for a Wikimania talk [2] about 
targeted acquisitions [3]. It’s still a stub for now but once it’s more fleshed 
out I’ll post it to the list to get feedback on the possible use of offsite 
acquisition campaigns as a leverage to increase the diversity of the Wikimedian 
population/shameless plug

[1] https://meta.wikimedia.org/wiki/Research:Gender_micro-survey
[2] 
https://wikimania2014.wikimedia.org/wiki/Submissions/The_missing_Wikipedia_ads:_Designing_targeted_contribution_campaigns
[3] https://meta.wikimedia.org/wiki/Targeted_acquisition_campaigns

 Were there any follow ups to the annual editor survey from 2011? A blog 
 post [3] says the survey was anticipated to be annual. There is a page about 
 a 2012 annual survey on Meta [4] but no results are posted and it appears no 
 follow up surveys were completed in 2012 or 2013. [5]

As Tilman noted in the section of the report about surveys, at this stage it’s 
not clear if there’s bandwidth to run these surveys on an annual basis.

Dario

 
 Thanks,
 
 Pine
 
 [1] 
 https://blog.wikimedia.org/2012/06/29/editor-survey-lack-of-time-and-unpleasant-interactions-hinder-contributions/
  
 [2] 
 https://blog.wikimedia.org/2012/04/27/nine-out-of-ten-wikipedians-continue-to-be-men/
 [3] 
 https://blog.wikimedia.org/2011/12/07/launching-the-second-annual-wikipedia-editor-survey/
 [4]https://meta.wikimedia.org/wiki/Research:Wikipedia_Editor_Survey_2012#Results
 [5] https://meta.wikimedia.org/wiki/Research:Projects
 
 From: dtarabore...@wikimedia.org
 Date: Wed, 5 Mar 2014 17:25:51 -0800
 To: analytics-inter...@lists.wikimedia.org; 
 wiki-research-l@lists.wikimedia.org
 Subject: [Wiki-research-l] Notes from the wiki research session at CSCW '14
 
 All,
 
 these are highlights from a session the Wikimedia Foundation’s Research  
 Data team hosted at CSCW ’14 in Baltimore. The audience was a group of 
 researchers either working on Wikipedia/Wikimedia-related research projects 
 or interested in learning about opportunities to collaborate with the 
 Foundation.
 
 Feel free to get in touch if you have any questions/comments.
 Contact
 
 Dario Taraborelli - da...@wikimedia.org
 Aaron Halfaker - ahalfa...@wikimedia.org
 Jonathan Morgan - jmor...@wikimedia.org 
 
 IRC: irc://irc.freenode.net/wikimedia-research (webclient)
 
 Mailing list: wiki-research-l (mailing list)
 
 
 Resources
 
 We gave a short overview of existing resources of potential interest to 
 Wikipedia/Wikimedia researchers:
 
 OAuth allows 3rd-party software to edit Wikipedia on behalf of a Wikipedia 
 editor and it’s a (mostly untapped) opportunity to run experimental research 
 or test new interfaces targeted at Wikipedians.  See: 
 https://www.mediawiki.org/wiki/Extension:OAuth#Using_OAuth
 Data portal summarizes data sources that are currently available to 
 researchers and app developers.  See: 
 https://meta.wikimedia.org/wiki/Research:Data
 Wikimedia Research Newsletter: A monthly overview reviewing or summarizing 
 recent research (contributions are welcome, please contact Dario if you’re 
 interested in contributing) 
 https://meta.wikimedia.org/wiki/Research:Newsletter 
 Subject recruitment. Aaron and Dario have managed a process for documenting 
 and vetting subject recruitment occurring on Wikimedia projects.  This 
 process was set in place to help resolve the tension between researchers’ 
 need to recruit subjects and editors’ desire to not be bothered.  The process 
 involves a public discussion and mentorship in order to ensure that proposed 
 studies that affect editors are well documented, are addressing original 
 questions and do not result in unnecessary disruption of wiki work. This is a 
 service we’ve been providing on a volunteer basis as members

Re: [Wiki-research-l] Notes from the wiki research session at CSCW '14

2014-03-06 Thread Dario Taraborelli
Hoi Gerard,

thanks for the gigantic list of questions – comments inline

On Mar 6, 2014, at 12:17 AM, Gerard Meijssen gerard.meijs...@gmail.com wrote:

 Hoi Dario,
 When you look at the statistics [1], you find that the number of page views 
 in English is going down faster than in the other languages combined. You 
 also find that the percentage of readers for the top ten Wikipedias in size 
 is slowly but surely decreasing (now at 88.94%). How can we decrease this 
 percentage even more without sacrificing the number of page views for the top 
 10?

I guess you saw our report on 2013 traffic trends [1], page views have been 
following a downward trend in 2013, but unique visitors as measured via 
comScore have been steadily growing over time and we have no evidence to date 
of a change in that trend, after controlling for seasonality. We are working 
with the analytics engineers to have more reliable data about traffic  to be 
able to accurately answer these questions, including breaking down readership 
trends by country, project, device and source.

[1] https://www.mediawiki.org/wiki/File:2013_Wikimedia_traffic_trends.pdf

 Has there been any research in how we can stimulate the growth in Wikipedias 
 that are not part of the top 10%. Do we know to what extend the English 
 Wikipedia model works for these other languages or is a hindrance. Do we know 
 what people are looking for in the smaller Wikipedias and do we know what 
 they do / do not find. Do we know how people find articles in those 
 languages, does this work in the same way as it does for English? Is it 
 possible that we have to cultivate contacts with the local “Googles in order 
 to grow attention for what we have to offer.

speaking for Analytics/Research  Data, we haven’t done a lot of original 
research let alone experimentation on small Wikipedias. I expect request logs 
and search logs will provide useful data to understand how people find articles 
on these projects.

 Do we know what the effect is of the new search engine that is much better at 
 providing results in other scripts? Do we know to what extend inter language 
 links are created and, do we know how this has changed since the move to 
 Wikidata? Dario, can you please tell us to what extend the other languages 
 are studied at all? Do we know what effect they have? Do we know about the 
 experience of these Wikipedias locally? Do we care about the typography in 
 other scripts? Do we know about the NPOV in the small projects? Do we know 
 about gender diversity in the smaller languages. How about cultural bias and 
 how does this compare to the cultural bias in the big projects? Dario there 
 is so much that we do not know, have not touched.

amen to that.

 Why study more of what has been studied to death?

I am not sure I understand your question, but if you are suggesting that we 
need to find better ways to pitch unexplored research to the wiki research 
community I am down with that. It’s sad that we haven’t found a good model to 
create a speed dating system to match research questions and researchers, but 
many people on this list as well as those who served on the research committee 
have expressed a lot of interest in fixing this problem. Do you want to help 
and do you have any example of strategies that you think might be successful?

Dario___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Notes from the wiki research session at CSCW '14

2014-03-05 Thread Dario Taraborelli
All,

these are highlights from a session the Wikimedia Foundation’s Research  Data 
team hosted at CSCW ’14 in Baltimore. The audience was a group of researchers 
either working on Wikipedia/Wikimedia-related research projects or interested 
in learning about opportunities to collaborate with the Foundation.

Feel free to get in touch if you have any questions/comments.
Contact
Dario Taraborelli - da...@wikimedia.org
Aaron Halfaker - ahalfa...@wikimedia.org
Jonathan Morgan - jmor...@wikimedia.org 

IRC: irc://irc.freenode.net/wikimedia-research (webclient)

Mailing list: wiki-research-l (mailing list)

Resources
We gave a short overview of existing resources of potential interest to 
Wikipedia/Wikimedia researchers:

OAuth allows 3rd-party software to edit Wikipedia on behalf of a Wikipedia 
editor and it’s a (mostly untapped) opportunity to run experimental research or 
test new interfaces targeted at Wikipedians.  See: 
https://www.mediawiki.org/wiki/Extension:OAuth#Using_OAuth
Data portal summarizes data sources that are currently available to researchers 
and app developers.  See: https://meta.wikimedia.org/wiki/Research:Data
Wikimedia Research Newsletter: A monthly overview reviewing or summarizing 
recent research (contributions are welcome, please contact Dario if you’re 
interested in contributing) https://meta.wikimedia.org/wiki/Research:Newsletter 
Subject recruitment. Aaron and Dario have managed a process for documenting and 
vetting subject recruitment occurring on Wikimedia projects.  This process was 
set in place to help resolve the tension between researchers’ need to recruit 
subjects and editors’ desire to not be bothered.  The process involves a public 
discussion and mentorship in order to ensure that proposed studies that affect 
editors are well documented, are addressing original questions and do not 
result in unnecessary disruption of wiki work. This is a service we’ve been 
providing on a volunteer basis as members of the Research Committee, it’s meant 
to offer support to researchers but doesn’t eliminate the risk that an account 
used for recruitment purposes might be blocked by an administrator. 
IRBs and minors. One of the issues that we discussed is dealing with IRB  
other ethics boards’ requirements when studies may result in interaction with 
minors.  Aaron ahalfa...@wikimedia.org is willing to discuss the issue with 
researchers and university staff upon request.  
Annual survey modules. Interest was expressed in exploring strategies for 
expanding the annual editor/reader survey with new questions contributed by 
researchers. At this point (March 2014) we cannot commit to any such project, 
but in general there is potential for cooperations between WMF and academic 
researchers in this area. Interested parties should contact Tilman Bayer 
(tbayer at wikimedia dot org) who has been conducting the last WMF editor 
survey and can provide information about these surveys (methodology, results, 
available data etc.) and their calendar.

WikiResearch Workshop at CSCW 2015. We discussed planning a workshop for CSCW 
next year. Anyone who is interested in collaborating, please contact us.  
Details are TBD, but our general goals include: 
increase awareness of the public data resources that are available
highlight research areas that are ripe for investigation, esp. where WMF could 
benefit from the results
get a better sense of what kind of data resources (and/or what data formats) 
researchers would like to have
brainstorm a (lightweight, ethical, practical) model for partnership between 
WMF and academic research orgs that want access to certain non-public data

 Wiki Research Hackathons. On Nov. 9th, 2013, we held our first global research 
hackathon (announcement).  We had universities and other local meetups from 
around the world connect via Google Hangout to share ideas, data and 
presentations geared toward datasets, code and other resources.  We’ll be 
planning another hackathon in the coming months.  You can help by hosting or 
attending your own local event.  Please contact us if you’re interested. 

Public listing on WMF’s strategic research questions. We discussed the 
potential for the Wikimedia Foundation to list out key areas of research that 
we are interested in.  This is something we are keenly interested in and you 
should expect to hear from us soon through wiki-research-l and @WikiResearch. 

Tweet @WikiResearch. We maintain a relatively high-visibility twitter account 
from which we tweet about new research,  data, and other initiatives.  If you 
tweet about your own wiki-related work @WikiResearch, we will retweet it so 
long as it’s relevant. We will also experiment with the use of this Twitter 
handle to increase the visibility of libraries and analytics tools to support 
Wikipedia research.

Internships/grad student residencies. We talked briefly about research 
collaborations, internships and other forms of work opportunities at WMF.  
We’re

Re: [Wiki-research-l] Wikimedia monthly research showcase: Feb 26, 11.30 PT

2014-02-26 Thread Dario Taraborelli
streaming will start in 2 minutes at: http://youtu.be/arO9YzcTWGE 

On Feb 25, 2014, at 6:06 PM, Dario Taraborelli da...@wikimedia.org wrote:

 Starting tomorrow (February 26), we will be broadcasting the monthly showcase 
 of the Wikimedia Research and Data team.
 
 The showcase is an opportunity to present and discuss recent work researchers 
 at the Foundation have been conducting. The showcase will start at 11.30 
 Pacific Time and we will post a link to the stream a few minutes before it 
 starts. You can also join the conversation on the #wikimedia-office IRC 
 channel on freenode (we’ll be sticking around after the end of the showcase 
 to answer any question).
 
 This month, we’ll be talking about Wikipedia mobile readers and article 
 creation trends:
 
 Oliver Keyes
 Mobile session times 
 A prerequisite to many pieces of interesting reader research is being able to 
 accurately identify the length of users' 'sessions'. I will explain one 
 potential way of doing it, how I’ve applied it to mobile readers, and what 
 research this opens up. (20 mins)
 https://meta.wikimedia.org/wiki/Research:Mobile_sessions
 
 Aaron Halfaker
 Wikipedia article creation research
 I'll present research examining trends in newcomer article creation across 10 
 languages with a focus on English and German Wikipedias.   I'll show that, in 
 wikis where anonymous users can create articles, their articles are less 
 likely to be deleted than articles created by newly registered editors.  I’ll 
 also show the results of an in-depth analysis of Articles for Creation (AfC) 
 which suggest that while AfC’s process seems to result in the publication of 
 high quality articles, it also dramatically reduces the rate at which good 
 new articles are published. (30 mins)
 https://meta.wikimedia.org/wiki/Research:Wikipedia_article_creation
 
 Looking forward to seeing you all tomorrow!
 
 Dario

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Wikimedia monthly research showcase: Feb 26, 11.30 PT

2014-02-25 Thread Dario Taraborelli
Starting tomorrow (February 26), we will be broadcasting the monthly showcase 
of the Wikimedia Research and Data team.

The showcase is an opportunity to present and discuss recent work researchers 
at the Foundation have been conducting. The showcase will start at 11.30 
Pacific Time and we will post a link to the stream a few minutes before it 
starts. You can also join the conversation on the #wikimedia-office IRC channel 
on freenode (we’ll be sticking around after the end of the showcase to answer 
any question).

This month, we’ll be talking about Wikipedia mobile readers and article 
creation trends:

Oliver Keyes
Mobile session times 
A prerequisite to many pieces of interesting reader research is being able to 
accurately identify the length of users' 'sessions'. I will explain one 
potential way of doing it, how I’ve applied it to mobile readers, and what 
research this opens up. (20 mins)
https://meta.wikimedia.org/wiki/Research:Mobile_sessions

Aaron Halfaker
Wikipedia article creation research
I'll present research examining trends in newcomer article creation across 10 
languages with a focus on English and German Wikipedias.   I'll show that, in 
wikis where anonymous users can create articles, their articles are less likely 
to be deleted than articles created by newly registered editors.  I’ll also 
show the results of an in-depth analysis of Articles for Creation (AfC) which 
suggest that while AfC’s process seems to result in the publication of high 
quality articles, it also dramatically reduces the rate at which good new 
articles are published. (30 mins)
https://meta.wikimedia.org/wiki/Research:Wikipedia_article_creation

Looking forward to seeing you all tomorrow!

Dario___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Upcoming research newsletter: new papers open for review

2014-02-24 Thread Dario Taraborelli
Hi everybody,

with CSCW just concluded and conferences like CHI and WWW coming up we have a 
good set of papers to review for the February issue of the Research Newsletter 
[1]

Please take a look at: https://etherpad.wikimedia.org/p/WRN201402 and add your 
name next to any paper you are interested in reviewing. As usual, short notes 
and one-paragraph reviews are most welcome.

Instead of contacting past contributors only, this month we’re experimenting 
with a public call for reviews cross-posted to analytics-l and wiki-research-l. 
if you have any question about the format or process feel free to get in touch 
off-list.

Dario Taraborelli and Tilman Bayer

[1] http://meta.wikimedia.org/wiki/Research:Newsletter
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Fwd: Upcoming talk at the Berkman Center on the gender gap and Internet use skills

2014-01-17 Thread Dario Taraborelli
Begin forwarded message:

 From: aaron shaw aarons...@northwestern.edu
 
 Date: Fri, 17 Jan 2014 08:52:02 -0600
 Subject: Upcoming talk at the Berkman Center on the gender gap and Internet 
 use skills
 
 
 I wanted to pass along the details of an upcoming talk that Eszter Hargittai 
 and I will be doing at the Berkman Center on Tuesday 1/21. We will present 
 preliminary findings of work-in-progress on the relationship between the 
 Wikipedia gender gap and people's internet skills. You can stream the talk 
 online or attend in-person (if you happen to be in the Boston area). More 
 details and an RSVP form are available on the Berkman Center website: 
 http://cyber.law.harvard.edu/events/luncheon/2014/01/hargittai-shaw
 
 All the best,
 Aaron
 
 
 [January 21] Internet Skills and Wikipedia's Gender Inequality
 
 with Eszter Hargittai and Aaron Shaw, Northwestern University
 
 January 21, 2014 at 12:30pm ET
 Berkman Center for Internet  Society, 23 Everett St, 2nd Floor
 RSVP required for those attending in person via the form
 This event will be webcast live (on this page) at 12:30pm ET.
 
 
 Although women are just as likely as men to read Wikipedia, they only 
 represent an estimated 16% of global Wikipedia editors and 23% of U.S. adult 
 Wikipedia editors. Previous research has focused on analyzing aspects of 
 current contributors and aspects of the existing Wikipedia community to 
 explain this gender gap in contributions. Instead, we analyze data about both 
 Wikipedia contributors and non-contributors. We also focus on a previously 
 ignored factor: people’s Internet skills. Our data set includes a diverse 
 group of American young adults with detailed information about their 
 background attributes, Internet experiences and skills. We find that the 
 gender gap in editing is exacerbated by a similarly important Internet skills 
 gap. By far the most likely people to contribute to Wikipedia are males with 
 high Internet skills. Our findings suggest that efforts to overcome the 
 gender gap in Wikipedia contributions must address the Web-use skills gap. 
 Future research needs to look at why high-skilled women do not contribute at 
 comparable rates to highly-skilled men.
 
 
 

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Distributing the Wikipedia category/pagelink graph

2013-12-10 Thread Dario Taraborelli
(cross-posting Sebastiano’s post from the analytics list, this may be of 
interest to both the wikidata and wiki-research-l communities)

Begin forwarded message:

 From: Sebastiano Vigna vi...@di.unimi.it
 Subject: [Analytics] Distributing an official graph
 Date: December 9, 2013 at 10:09:31 PM PST
 
 [Reposted from private discussion after Dario's request]
 
 My problem is that of exploring the graph structure of Wikipedia
 
 1) easily;
 2) reproducibly;
 3) in a way that does not depend on parsing artifacts.
 
 Presently, when people wants to do this they either do their own parsing of 
 the dumps, or they use the SQL data, or they download a dataset like
 
 http://law.di.unimi.it/webdata/enwiki-2013/
 
 which has everything cooked up.
 
 My frustration in the last few days was when trying to add the category 
 links. I didn't realize (well, it's not very documented) that bliki extracts 
 all links and render them in HTML *except* for the category links, that are 
 instead accessible programmatically. Once I got there, I was able to make 
 some progress.
 
 Nonetheless, I think that the graph of Wikipedia connections (hyperlinks and 
 category links) is really a mine of information and it is a pity that a lot 
 of huffing and puffing is necessary to do something as simple as a reverse 
 visit of the category links from People to get, actually, all people pages 
 (this is a bit more complicated--there are many false positives, but after a 
 couple of fixes worked quite well).
 
 Moreover, one has continuously this feeling of walking on eggshells: a small 
 change in bliki, a small change in the XML format and everything might stop 
 working is such a subtle manner that you realize it only after a long time.
 
 I was wondering if Wikimedia would be interested in distributing in 
 compressed form the Wikipedia graph. That would be the official Wikipedia 
 graph--the benefits, in particular for people working on leveraging semantic 
 information from Wikipedia, would be really significant.
 
 I would (obviously) propose to use our Java framework, WebGraph, which is 
 actually quite standard in distributing large (well, actually much larger) 
 graphs, such as ClueWeb09 http://lemurproject.org/clueweb09/, ClueWeb12 
 http://lemurproject.org/clueweb12/ and the recent Common Web Crawl 
 http://webdatacommons.org/hyperlinkgraph/index.html. But any format is OK, 
 even a pair of integers per line. The advantage of a binary compressed form 
 is reduced network utilization, instantaneous availability of the 
 information, etc.
 
 Probably it would be useful to actually distribute several graphs with the 
 same dataset--e.g., the category links, the content link, etc. It is 
 immediate, using WebGraph, to build a union (i.e., a superposition) of any 
 set of such graphs and use it transparently as a single graph.
 
 In my mind the distributed graph should have a contiguous ID space, say, 
 induced by the lexicographical order of the titles (possibly placing template 
 pages at the start or at the end of the ID space). We should provide graphs, 
 and a bidirectional node-title map. All such information would use about 
 300M of space for the current English Wikipedia. People could then associate 
 pages to nodes using the title as a key.
 
 But this last part is just rambling. :)
 
 Let me know if you people are interested. We can of course take care of the 
 process of cooking up the information once it is out of the SQL database.
 
 Ciao,
 
   seba
 
 
 ___
 Analytics mailing list
 analyt...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Join the inaugural Wiki Research Hackathon on November 9

2013-10-25 Thread Dario Taraborelli
Cross-posting the announcement from the Wikimedia Blog. The details of the 
event are on Meta and we're also creating meetup.com pages for the local 
events. Check them out and RSVP if you're planning to attend. Looking forward 
to see you on November 9!

Dario, on behalf of the organizers
 
Join the inaugural Wiki Research Hackathon on November 9

Last summer at Wikimania in Hong Kong, the annual global Wikimedia conference, 
we (a group of Wikipedia researchers) discussed how we could make wiki 
researchmore impactful. In our work in academia and on Wikimedia projects, we 
saw a host of missed opportunities to share ideas, hypotheses, code, and 
research methods. We set out to create a space to bring researchers together 
with Wikipedians and facilitate problem solving, discovery and innovation with 
the use of open data and open source tools. Labs2 (L2) aims to build this 
space, by providing infrastructure and venues for collaborative wiki research.
Today we’re thrilled to announce the inaugural Wiki Research Hackathon – a 
global event hosted by Wikimedia Foundation researchers, academic researchers 
and Wikipedians from around the world on Saturday, November 9, 2013.
What
This hackathon is an opportunity for anyone interested in research on wikis, 
Wikipedia, and open collaboration to meet, share ideas, and work together. It 
is targeted at Wikipedia editors, students, researchers, coders and anyone 
interested in designing new tools, statistics and data visualization, and 
producing new knowledge about Wikimedia projects and their communities.
The goal of this event is to:
share knowledge about research tools and datasets (and how to use them)
ask burning research questions (and learn how to answer them)
get involved in ongoing research projects (or start new ones)
design new data-driven apps and tools (or hack existing ones)
Where


(Locations are approximate)
This hackathon will be held both as a series of local meetups (Perth, Mannheim, 
Oxford,Rio de Janeiro, Chicago, Minneapolis, San Francisco, Seattle, etc.) and 
virtual meetups (Asia/Oceania, Europe/Africa  The Americas) for those who 
can’t make it to the local events. An IRC channel (#wikimedia-labsconnect) and 
a Google Hangout open throughout the day will allow attendees to connect online.
How
Interested attendees can sign up for the event on Meta-wiki.
Local and virtual meetups are listed on theevent page. All you need to do is 
add your name to the list of participants for the event that makes sense for 
you.
Who
For any question about the event (including volunteering for a local meetup), 
you can reach us at w...@wikimedia.org or leave a message on the hackathon’s 
talk page on Meta-wiki. We look forward to seeing you on November 9.
Aaron Halfaker, Wikimedia Foundation
Jonathan Morgan, Wikimedia Foundation
Morten Warncke-Wang, University of Minnesota
Aaron Shaw, Northwestern University
Dario Taraborelli, Wikimedia Foundation
Taha Yasseri, Oxford University
Henrique Andrade, Wikimedia Foundation


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Bots vs. Wikipedians – Who edits more?

2013-10-14 Thread Dario Taraborelli
A new app by Thomas Steiner (@tomayac) counting bot vs human edits in real time 
from the RecentChanges feed:

http://wikipedia-edits.herokuapp.com/

(read more [2]). The application comes with a public API exposing Wikipedia and 
Wikidata edits as Server-Sent Events. [1]

Dario

[1] 
http://blog.tomayac.com/index.php?date=2013-10-14time=16:49:46perma=Bots+vs.+Wikipedians.html
[2] https://en.wikipedia.org/wiki/Server-sent_events


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] The Wikimedia Research Newsletter 3(7) is out

2013-08-02 Thread Dario Taraborelli
apologies the correct link is 
http://meta.wikimedia.org/wiki/Research:Newsletter/2013/July

On Aug 2, 2013, at 5:39 PM, Dario Taraborelli dtarabore...@wikimedia.org 
wrote:

 The July 2013 issue of the Wikimedia Research Newsletter is out:
 
 https://meta.wikimedia.org/wiki/Research:Newsletter/2013/May
 
 In this issue:
 
 • 1 Multilingual ranking analysis: Napoleon and Michael Jackson as 
 Wikipedia's global heroes
 • 2 Wikipedia as Cultural Reference: Srebrenica Massacre, Art and Menstruation
 • 3 Decline of adminship candidatures on Polish Wikipedia
 • 4 90% of Wikipedia articles have equivalent or better quality than their 
 Britannica counterparts in blind expert review
 • 5 First WikiSym 2013 papers available
 • 6 Survey participation bias analysis: More Wikipedia editors are female, 
 married or parents than previously assumed
 • 7 Briefly
 • 8 References
 
 ••• 28 publications were covered in this issue •••
 Thanks to: Taha Yasseri, Han-Teng Liao, Piotr Konieczny and Jonathan Morgan 
 for contributing
 
 Dario Taraborelli and Tilman Bayer
 
 --
 Wikimedia Research Newsletter
 https://meta.wikimedia.org/wiki/Research:Newsletter/
 
 * Follow us on Twitter/Identi.ca: @WikiResearch
 * Receive this newsletter by mail: 
 https://lists.wikimedia.org/mailman/listinfo/research-newsletter 
 * Subscribe to the RSS feed: 
 http://blog.wikimedia.org/c/research-2/wikimedia-research-newsletter/feed/
 
 



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Fwd: [Wikidata-l] Sex Ratios in Wikidata and Authorty Files

2013-05-15 Thread Dario Taraborelli
cross-posting from wikidata-l

Begin forwarded message:

 From: Klein,Max kle...@oclc.org
 Subject: [Wikidata-l] Sex Ratios in Wikidata and Authorty Files
 Date: May 15, 2013 10:42:34 AM PDT
 To: wikidat...@lists.wikimedia.org wikidat...@lists.wikimedia.org
 Reply-To: Discussion list for the Wikidata project. 
 wikidat...@lists.wikimedia.org
 
 Hello All,
 
 I wanted to share with you a visualisation about sex ratios in Wikidata, 
 after the so called categorygate New York Times article. I'm really excited 
 about how Wikidata is going to allow us to compare Claims data against the 
 interwiki link section. Is there going to be official support for this in 
 phase 3?
 This is how Items with Property:Sex (P21) compare by language:
 
 
 
 In the full blog post I compare it against Sex data from Library Authority 
 Files if you're curious http://hangingtogether.org/?p=2877
 
 Maximilian Klein
 Wikipedian in Residence, OCLC
 +17074787023
 ___
 Wikidata-l mailing list
 wikidat...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] What's the average number of edits per day on English Wikipedia?

2013-05-07 Thread Dario Taraborelli
Hi Jodisee attached, the median of daily enwiki edits (all namespaces, including bots and anonymous edits) in 2013 to date is 127K. (data from the RecentChanges feed)Generated from:,https://dash.metamarkets.com/wikipedia_editstream/explore#e=2013-05-02gran=dayp=customs=2013-01-01w.0.k=languagew.0.v.0=enzz=3,
Language:,en,

Timestamp,Edits,Unique Users,Total Delta,Total Added,Total Deleted,Average Delta,Average Added,Average Deleted,Total Variation,Average Variation
2013-01-01T00:00:00.000Z,109509,17077.36238,17890928,42826073,-24935145,163.3740423,391.0735465,-227.6995042,67761218,618.7730506
2013-01-02T00:00:00.000Z,124583,20534.71704,24848790,49554926,-24706136,199.4557042,397.7663566,-198.3106523,74261062,596.0770089
2013-01-03T00:00:00.000Z,128169,21510.81527,23281751,51350328,-28068577,181.6488464,400.6454603,-218.9966138,79418905,619.6420741
2013-01-04T00:00:00.000Z,142757,20871.59554,24016497,49024175,-25007678,168.2334106,343.4099554,-175.1765448,74031853,518.5865001
2013-01-05T00:00:00.000Z,127335,18629.80667,16003587,44084907,-28081320,125.6809754,346.2120155,-220.5310402,72166227,566.7430557
2013-01-06T00:00:00.000Z,126596,19097.3429,13426230,43124985,-29698755,106.0557206,340.650455,-234.5947344,72823740,575.2451894
2013-01-07T00:00:00.000Z,124507,22156.44517,21157900,46101548,-24943648,169.9334174,370.2727397,-200.3393223,71045196,570.612062
2013-01-08T00:00:00.000Z,129899,21475.28865,22481974,41471508,-18989534,173.0727257,319.2596402,-146.1869144,60461042,465.4465546
2013-01-09T00:00:00.000Z,188192,21949.96861,20674129,43920011,-23245882,109.8565773,233.3787355,-123.5221582,67165893,356.9008938
2013-01-10T00:00:00.000Z,134529,21194.3651,17178793,45486474,-28307681,127.6958351,338.1164953,-210.4206602,73794155,548.5371556
2013-01-11T00:00:00.000Z,124030,21088.44855,22506912,39844572,-17337660,181.4634524,321.2494719,-139.7860195,57182232,461.0354914
2013-01-12T00:00:00.000Z,118622,18575.88641,20926551,39536026,-18609475,176.4137428,333.2942119,-156.8804691,58145501,490.1746809
2013-01-13T00:00:00.000Z,121642,19795.91129,20654925,38159126,-17504201,169.8009322,313.7002516,-143.8993193,55663327,457.5995709
2013-01-14T00:00:00.000Z,133275,21501.94943,22834176,44735200,-21901024,171.3312774,335.6608516,-164.3295742,66636224,499.9904258
2013-01-15T00:00:00.000Z,124116,20951.75609,21234869,43429136,-22194267,171.0888926,349.9076348,-178.8187421,65623403,528.7263769
2013-01-16T00:00:00.000Z,120694,20969.41396,14662266,48828919,-34166653,121.4829735,404.5679073,-283.0849338,82995572,687.6528411
2013-01-17T00:00:00.000Z,122904,21285.54626,17684600,43214305,-25529705,143.8895398,351.6102405,-207.7207007,68744010,559.3309412
2013-01-18T00:00:00.000Z,122013,19535.92029,22298238,49792792,-27494554,182.7529689,408.0941539,-225.341185,77287346,633.4353389
2013-01-19T00:00:00.000Z,115666,18292.27368,19619649,36163122,-16543473,169.6233033,312.6512718,-143.0279685,52706595,455.6792402
2013-01-20T00:00:00.000Z,114867,18608.82557,20786859,38024756,-17237897,180.9645851,331.0328989,-150.0683138,55262653,481.1012127
2013-01-21T00:00:00.000Z,94672,16254.60799,18314632,35635837,-17321205,193.4535237,376.4136915,-182.9601677,52957042,559.3738592
2013-01-22T00:00:00.000Z,121721,21877.8,20991667,41894083,-20902416,172.4572342,344.1812259,-171.7239918,62796499,515.9052177
2013-01-23T00:00:00.000Z,126372,22280.91477,31099326,57362360,-26263034,246.0934859,453.9166904,-207.8232045,83625394,661.7398949
2013-01-24T00:00:00.000Z,123123,21089.4035,7180400,51646751,-44466351,58.31891686,419.4728117,-361.1538949,96113102,780.6267066
2013-01-25T00:00:00.000Z,129288,20086.96596,22378997,41553716,-19174719,173.0941541,321.4042757,-148.3101216,60728435,469.7143973
2013-01-26T00:00:00.000Z,123238,18427.66687,17602388,36226442,-18624054,142.8324705,293.9551275,-151.122657,54850496,445.0777844
2013-01-27T00:00:00.000Z,137850,19489.03268,17789920,40705395,-22915475,129.0527385,295.2875952,-166.2348567,63620870,461.5224519
2013-01-28T00:00:00.000Z,132593,21938.19886,25373409,45979048,-20605639,191.3631112,346.7682909,-155.4051798,66584687,502.1734707
2013-01-29T00:00:00.000Z,128717,22480.97897,19752072,45965516,-26213444,153.4534832,357.1052464,-203.6517632,72178960,560.7570096
2013-01-30T00:00:00.000Z,133774,3.85671,25676280,44772926,-19096646,191.9377458,334.6907919,-142.7530462,63869572,477.4438381
2013-01-31T00:00:00.000Z,123180,21382.13499,22690156,43197254,-20507098,184.2032473,350.6839909,-166.4807436,63704352,517.1647345
2013-02-01T00:00:00.000Z,122451,21089.42258,23802871,43202950,-19400079,194.3869058,352.8182702,-158.4313644,62603029,511.2496345
2013-02-02T00:00:00.000Z,122136,18144.91371,19533565,38784390,-19250825,159.9329027,317.550845,-157.6179423,58035215,475.1687873
2013-02-03T00:00:00.000Z,122177,20261.50128,19062334,40650649,-21588315,156.0222792,332.7193252,-176.6970461,62238964,509.4163713

[Wiki-research-l] The Wikimedia Research Newsletter 3(3) is out

2013-04-02 Thread Dario Taraborelli
The March 2013 issue of the Wikimedia Research Newsletter is out:

https://meta.wikimedia.org/wiki/Research:Newsletter/2013/March

In this issue:

• 1 Wikipedia's Ignore all rules policy (IAR) is a double edged sword in 
deletion arguments
• 2 Activity of content translators on Wikipedia examined
• 3 Comparison of collaborative editing in OpenStreetMap and Wikipedia
• 4 Wikipedia's coverage of breaking news stories is still a fertile field of 
research
• 5 Exposing talk page discussions leads to drop in perceived article quality
• 6 Briefly
  • 100 million hours spent editing Wikipedia
  • Wiktionary and sign language
  • Wikipedia compared to QA website in Korea
  • Wikipedia articles on nephrology reliable, but hard to read
  • Comparing English and Arabic Wikipedia POV differences
  • The overrepresentation of cricket on English Wikipedia
  • Grumpiness due to a serious typographical error
  • Wikipedians do not tend to conform more to groupthink when in a less 
anonymous situation
  • Estimate for economic benefit of Wikipedia: $50 million by 2006 already
  • 91% of German journalists use Wikipedia
  • Inserting weblinks on Wikipedia to drive traffic
  • Case study on Accommodating the Wikipedia Project in Higher Education
  • Wikipedia student club participation
  • Monthly edits still on the rise
  • How many Wikipedia edits come from locals?
  • New overview page of Wikimedia data for researchers
  • Wikimedia funding for Wikisym '13 despite open access concerns
  • Research newsletter started on French Wikipedia
  • Inferring relationships from editing behavior on Wikipedia
  • Google Research releases the WikiLinks Corpus: 40M mentions to Wikipedia 
pages collected from 10M web pages
• 7 References

••• 26 publications were covered in this issue •••
Thanks to:  Amir E. Aharoni, Piotr Konieczny, Taha Yasseri, Oren Bochman, 
Heather Ford, Giovanni Luca Ciampaglia and Daniel Mietchen for contributing

Dario Taraborelli and Tilman Bayer

--
Wikimedia Research Newsletter
https://meta.wikimedia.org/wiki/Research:Newsletter/

* Follow us on Twitter/Identi.ca: @WikiResearch
* Receive this newsletter by mail: 
https://lists.wikimedia.org/mailman/listinfo/research-newsletter 
* Subscribe to the RSS feed: 
http://blog.wikimedia.org/c/research-2/wikimedia-research-newsletter/feed/


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] A year’s worth of Wikipedia research: the 2012 WRN corpus

2013-02-28 Thread Dario Taraborelli
(cross-posting for those of you who are not following the WMF blog)

We've just released in the public domain a curated corpus with the 
bibliographic references of all 225 publications covered in the Wikimedia 
Research Newsletter, vol. 2 (2012), forming a historical record of Wikipedia 
research in the last year. [1] The corpus can be browsed on Zotero [2] or 
downloaded as a bibtex file from the DataHub. [3]

We also released a 95-page PDF with the full text of the 12 issues published in 
2012 [4]. 

The newsletter would not exist without the help of the following contributors 
that we wish to acknowledge for posting reviews and research summaries in 2012:

Aaron Shaw, Adam Hyland, Amir E. Aharoni, Angelika Adam, Bence Damokos, 
Benjamin Mako Hill, Daniel Mietchen, Diederik van Liere, Evan Rosen, Heather 
Ford, Giovanni Luca Ciampaglia, Jodi Schneider,  User:Lambiam, Nicolas Jullien, 
Oren Bochman, Phoebe Ayers, Piotr Konieczny, Sage Ross, Steven Walling, Taha 
Yasseri.

If you are interested in becoming a contributor please consider joining the WRN 
team. [5]

Dario Taraborelli and Tilman Bayer

[1] https://blog.wikimedia.org/2013/02/27/a-years-worth-of-wikipedia-research/
[2] https://www.zotero.org/wikiresearch/items/collectionKey/6R92V9E7
[3] http://datahub.io/en/dataset/wikimedia-research-newsletter
[4] https://upload.wikimedia.org/wikipedia/commons/d/d9/WRN_2012.pdf
[5] https://meta.wikimedia.org/wiki/Research:Newsletter#How_to_contribute

--
Wikimedia Research Newsletter
https://meta.wikimedia.org/wiki/Research:Newsletter/

* Follow us on Twitter/Identi.ca: @WikiResearch
* Receive this newsletter by mail: 
https://lists.wikimedia.org/mailman/listinfo/research-newsletter 
* Subscribe to the RSS feed: 
http://blog.wikimedia.org/c/research-2/wikimedia-research-newsletter/feed/___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] The Wikimedia Research Newsletter 3(2) is out

2013-02-28 Thread Dario Taraborelli
The February 2013 issue of the Wikimedia Research Newsletter is out:

http://meta.wikimedia.org/wiki/Research:Newsletter/2013/February

In this issue:

1 Wikipedia in historic context: Stigmergic accumulation is not new
2 UK university lecturers still skeptical and uninformed about Wikipedia
3 Saint Petersburg has more sisters than any other city in the world
4 Distributed Wiki proposal to replace NPOV with every point of view (EPOV)
5 Briefly
6 References

••• 15 publications were covered in this issue •••
Thanks to  Piotr Konieczny, Taha Yasseri, Heather Ford, Sage Ross and Daniel 
Mietchen for contributing

Dario Taraborelli and Tilman Bayer

--
Wikimedia Research Newsletter
https://meta.wikimedia.org/wiki/Research:Newsletter/

* Follow us on Twitter/Identi.ca: @WikiResearch
* Receive this newsletter by mail: 
https://lists.wikimedia.org/mailman/listinfo/research-newsletter 
* Subscribe to the RSS feed: 
http://blog.wikimedia.org/c/research-2/wikimedia-research-newsletter/feed/
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] The Wikimedia Research Newsletter 3(1) is out

2013-01-31 Thread Dario Taraborelli
The January 2013 issue of the Wikimedia Research Newsletter is out:

http://meta.wikimedia.org/wiki/Research:Newsletter/2013/January

1 Lessons from the wiki research literature in American Behavioral Scientist 
special issue
2 Mathematical model for attention to the promoted Wikipedia articles
3 The featured article icon and other heuristics for students to judge article 
credibility
4 Briefly
5 References

••• 21 publications were covered in this issue •••
Thanks to Taha Yasseri, Piotr Konieczny,  Aaron Shaw and Luisa Emmi Beck for 
contributing

Dario Taraborelli and Tilman Bayer

--
Wikimedia Research Newsletter
https://meta.wikimedia.org/wiki/Research:Newsletter/

* Follow us on Twitter/Identi.ca: @WikiResearch
* Receive this newsletter by mail: 
https://lists.wikimedia.org/mailman/listinfo/research-newsletter 
* Subscribe to the RSS feed: 
http://blog.wikimedia.org/c/research-2/wikimedia-research-newsletter/feed/___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] The Wikimedia Research Newsletter 2(12) is out

2013-01-03 Thread Dario Taraborelli
The December 2012 issue of the Wikimedia Research Newsletter is out:

http://meta.wikimedia.org/wiki/Research:Newsletter/2012/December

This issue completes the 2nd volume of the newsletter.

1 How Wikipedia deals with a mass shooting
2 Network positions and contributions to online public goods: the case of the 
Chinese Wikipedia
3 Quality of pharmaceutical articles in the Spanish Wikipedia
4 Wikipedia editing patterns are consistent with a non-finite state model of 
computation
5 Wikipedia as our collective memory
6 SOPA blackout decision analyzed
7 Bots and collective intelligence explored in dissertation
8 Briefly
9 References

••• 20 publications were covered in this issue •••
Thanks to  Daniel Mietchen, Piotr Konieczny, Giovanni Luca Ciampaglia, Taha 
Yasseri, Benjamin Mako Hill, Aaron Shaw and Sage Ross for contributing

Dario Taraborelli and Tilman Bayer

--
Wikimedia Research Newsletter
https://meta.wikimedia.org/wiki/Research:Newsletter/

* Follow us on Twitter/Identi.ca: @WikiResearch
* Receive this newsletter by mail: 
https://lists.wikimedia.org/mailman/listinfo/research-newsletter 
* Subscribe to the RSS feed: 
http://blog.wikimedia.org/c/research-2/wikimedia-research-newsletter/feed/
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Quick question on article count

2012-12-15 Thread Dario Taraborelli
Hi Hrafn,

that's correct, here is some background: 
https://meta.wikimedia.org/wiki/Article_count_reform

Dario

On Dec 9, 2012, at 5:09 PM, Hrafn H Malmquist h...@hi.is wrote:

 Hello
 
 Just wondering, as I have not been following meta.
 
 Reading: https://meta.wikimedia.org/wiki/User:Dcljr/Article_counts
 
 I understand Special:Statistics counts Content pages as: a non-redirect in 
 a content namespace, containing (after parsing) at least one true 
 [[wikilink]] to another page on the same wiki.
 
 Is this up to date?
 
 Best, Hrafn
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] The Wikimedia Research Newsletter 2(11) is out

2012-11-28 Thread Dario Taraborelli
The November 2012 issue of the Wikimedia Research Newsletter is out:

https://meta.wikimedia.org/wiki/Research:Newsletter/2012/November

In this issue:

1 Early prediction of movie box-office revenues with Wikipedia data
2 Readability of the English Wikipedia, Simple Wikipedia, and Britannica 
compared
3 Wikipedia favors established views and scientifically backed knowledge
4 Trust, authority and credentials on Wikipedia: The case of the Essjay 
controversy
5 Briefly
6 References

••• 16 publications were covered in this issue •••
Thanks to Piotr Konieczny, Benjamin Mako Hill, Taha Yasseri, Heather Ford and 
Diederik van Liere for contributing

Dario Taraborelli and Tilman Bayer

--
Wikimedia Research Newsletter
https://meta.wikimedia.org/wiki/Research:Newsletter/

* Follow us on Twitter/Identi.ca: @WikiResearch
* Receive this newsletter by mail: 
https://lists.wikimedia.org/mailman/listinfo/research-newsletter 
* Subscribe to the RSS feed: 
http://blog.wikimedia.org/c/research-2/wikimedia-research-newsletter/feed/
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] The Wikimedia Research Newsletter 2(10) is out

2012-10-31 Thread Dario Taraborelli
The October 2012 issue of the Wikimedia Research Newsletter is out:

https://meta.wikimedia.org/wiki/Research:Newsletter/2012/October

In this issue:

1 Wikipedia governance found to be mostly informal
2 Social network analysis of Wikipedia community
3 Wikipedia's article on the Rorschach inkblot test found to have a limited 
effect on the test's results
4 Efficiency of Wikipedia in editor recruitment and content production
5 Student use of Wikipedia
6 In brief
7 References

••• 17 publications were covered in this issue •••
Thanks to Piotr Konieczny, Taha Yasseri, Bence Damokos, Sage Ross and Phoebe 
Ayers for contributing

Dario Taraborelli and Tilman Bayer

--
Wikimedia Research Newsletter
https://meta.wikimedia.org/wiki/Research:Newsletter/

* Follow us on Twitter/Identi.ca: @WikiResearch
* Receive this newsletter by mail: 
https://lists.wikimedia.org/mailman/listinfo/research-newsletter 
* Subscribe to the RSS feed: 
http://blog.wikimedia.org/c/research-2/wikimedia-research-newsletter/feed/___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] 1-year dump of English Wikipedia article ratings

2012-10-28 Thread Dario Taraborelli
no, that's based on textual feedback data from a small random sample of 
articles  [1] from the AFTv5 tests, not the current ratings (AFTv4)

[1] http://meta.wikimedia.org/wiki/Research:AFT


On Oct 27, 2012, at 2:13 PM, Taha Yasseri taha.yas...@gmail.com wrote:

 Thanks Dario.
 I should also add your own CSCW'13  paper. Right?
 
 On Sat, Oct 27, 2012 at 10:51 PM, Dario Taraborelli 
 dtarabore...@wikimedia.org wrote:
 …and on a final note, this is an awesome work in progress that attempts to 
 classify Wikipedia articles based on a broad range of quality metrics 
 (including AFT ratings).
 
 https://github.com/slaporte/qualityvis
 
 
 On Oct 27, 2012, at 1:42 PM, Dario Taraborelli dtarabore...@wikimedia.org 
 wrote:
 
 I forgot to mention Ashton Anderson's dataviz work based on AFTv4 data
 
 https://graphics.stanford.edu/wikis/cs448b-11-fall/FP-AndersonAshton
 
 On Oct 27, 2012, at 1:36 PM, Dario Taraborelli dtarabore...@wikimedia.org 
 wrote:
 
 Taha,
 
 other than the internal reports during the product dev phase [1] and some 
 occasional uses of this data in the literature, there hasn't been much work 
 on AFT ratings. To my knowledge, the best use of this data outside of WMF 
 is in Adam Hyland's work (he presented a study at Wikimania [2] and I think 
 he's working on a follow-up paper).
 
 Dario
 
 [1] http://www.mediawiki.org/wiki/Article_feedback/Research
 [2] http://en.wikipedia.org/wiki/User:Protonk/Article_Feedback
 
 
 On Oct 27, 2012, at 6:57 AM, Taha Yasseri taha.yas...@gmail.com wrote:
 
 Hi Dario,
 Thank you. That's indeed a very interesting data set.
 
 Is anyone aware of any study or analysis of this or similar data on 
 article ratings?
 Even a raw data analysis would be very helpful to set up a systematic 
 study. Unfortunately, I'm not update on the state of the art. 
 
 cheers,
 .Taha
 
 On Mon, Oct 22, 2012 at 10:51 PM, Dario Taraborelli 
 dtarabore...@wikimedia.org wrote:
 We've released a full, anonymized dump of article ratings (aka AFTv4) 
 collected over 1 year since the deployment of the tool on the entire 
 English Wikipedia (July 22, 2011 - July 22, 2012).
 
 http://thedatahub.org/en/dataset/wikipedia-article-ratings
 
 The dataset (which includes 11m unique article ratings along 4 dimensions) 
 is licensed under CC0 and supersedes the partial dumps originally hosted 
 on the dumps server. Real-time AFTv4 data remains available as usual via 
 the toolserver. Feel free to get in touch if you have any questions about 
 this data.
 
 Dario
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 
 
 -- 
 .t
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 
 
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 
 
 
 -- 
 .t
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] 1-year dump of English Wikipedia article ratings

2012-10-27 Thread Dario Taraborelli
Taha,

other than the internal reports during the product dev phase [1] and some 
occasional uses of this data in the literature, there hasn't been much work on 
AFT ratings. To my knowledge, the best use of this data outside of WMF is in 
Adam Hyland's work (he presented a study at Wikimania [2] and I think he's 
working on a follow-up paper).

Dario

[1] http://www.mediawiki.org/wiki/Article_feedback/Research
[2] http://en.wikipedia.org/wiki/User:Protonk/Article_Feedback


On Oct 27, 2012, at 6:57 AM, Taha Yasseri taha.yas...@gmail.com wrote:

 Hi Dario,
 Thank you. That's indeed a very interesting data set.
 
 Is anyone aware of any study or analysis of this or similar data on article 
 ratings?
 Even a raw data analysis would be very helpful to set up a systematic study. 
 Unfortunately, I'm not update on the state of the art. 
 
 cheers,
 .Taha
 
 On Mon, Oct 22, 2012 at 10:51 PM, Dario Taraborelli 
 dtarabore...@wikimedia.org wrote:
 We've released a full, anonymized dump of article ratings (aka AFTv4) 
 collected over 1 year since the deployment of the tool on the entire English 
 Wikipedia (July 22, 2011 - July 22, 2012).
 
 http://thedatahub.org/en/dataset/wikipedia-article-ratings
 
 The dataset (which includes 11m unique article ratings along 4 dimensions) is 
 licensed under CC0 and supersedes the partial dumps originally hosted on the 
 dumps server. Real-time AFTv4 data remains available as usual via the 
 toolserver. Feel free to get in touch if you have any questions about this 
 data.
 
 Dario
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 
 
 -- 
 .t
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] 1-year dump of English Wikipedia article ratings

2012-10-22 Thread Dario Taraborelli
We've released a full, anonymized dump of article ratings (aka AFTv4) collected 
over 1 year since the deployment of the tool on the entire English Wikipedia 
(July 22, 2011 - July 22, 2012).

http://thedatahub.org/en/dataset/wikipedia-article-ratings

The dataset (which includes 11m unique article ratings along 4 dimensions) is 
licensed under CC0 and supersedes the partial dumps originally hosted on the 
dumps server. Real-time AFTv4 data remains available as usual via the 
toolserver. Feel free to get in touch if you have any questions about this data.

Dario
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] 1-year dump of English Wikipedia article ratings

2012-10-22 Thread Dario Taraborelli
what James said

The dumps server was never meant to become a permanent open data repository, 
but it started being used as an ad-hoc solution to host all sort of datasets 
published by WMF on top of the actual XML dumps: that's the problem we're 
trying to fix. 

Regardless of where the data is physically hosted, your go-to point to discover 
WMF datasets from now on is the DataHub. Think of it as a data registry: the 
registry is  all you need to know in order to find where the data is hosted and 
to extract the appropriate metadata/documentation.

HTH

Dario
 
On Oct 22, 2012, at 5:06 PM, James Forrester ja...@jdforrester.org wrote:

 On 22 October 2012 16:03, Hydriz Wikipedia ad...@alphacorp.tk wrote:
 Hi all,
 
 I have long been wanting to say this, but is it possible for the team behind
 compiling such datasets to put future (and if possible, current) datasets
 into dumps.wikimedia.org so that it is easier for everyone to find stuff and
 not be all over the place? Thanks for that!
 
 Many one-off and regular datasets, from query results to data dumps
 and similar, are now indexed[0] on The Data Hub (formerly CKAN) run by
 the Open Knowledge Foundation for precisely this reason - so that data
 researchers can easily find data about Wikimedia, and see when it's
 updated.
 
 [0] - http://thedatahub.org/en/group/wikimedia
 
 J.
 -- 
 James D. Forrester
 jdforres...@gmail.com
 [[Wikipedia:User:Jdforrester|James F.]] (speaking purely in a personal 
 capacity)
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] 1-year dump of English Wikipedia article ratings

2012-10-22 Thread Dario Taraborelli
Thanks Jérémie,

we are definitely aiming for a more official announcement. The reason for the 
soft launch is that, after experimenting for a few months with the DataHub, we 
are still reporting to the developers issues that need to be addressed before a 
broader announcement. The CKAN data browser, for example, is quite rudimentary; 
there is limited support for batch file upload; data citation support is not 
keeping up with standards/best practices in the field etc. If anyone on these 
lists is interested in crash-testing the repository I'd be happy to follow up 
off-list.

Despite these issues, CKAN remains our engine of choice: it's open source, 
actively maintained by OKFN (an organization whose mission is aligned to 
Wikimedia's) and is currently used by large orgs and governments to run 
institutional repositories (like http://data.gov.uk).

The long-term vision is that of an actual data/API hub built on top of a 
naked repository, to facilitate the discovery/reuse of various data sources. I 
copy below a note I posted some weeks ago to wikitech-l on this topic.

Dario

Begin forwarded message:

 From: Dario Taraborelli da...@wikimedia.org
 Subject: Re: [Wikitech-l] Proposal to add an API/Developer/Developer Hub link 
 to the footer of Wikimedia wikis
 Date: September 25, 2012 10:55:47 AM PDT
 
 I am very excited to see this proposal and happy to help in my spare time, 
 thanks for starting the thread. In fact, I started brainstorming a while ago 
 with a number of colleagues and community members on how an ideal Wikimedia 
 developer hub might look like. 
 
 My thoughts:
 
 (1) the hub should be focused on documenting reuse of Wikimedia's data 
 sources (the API, the XML dumps, the IRC streams), not just the MediaWiki 
 codebase. We are investing quite a lot of outreach effort in the MediaWiki 
 developer community, this hub should be broader in scope and support the 
 development of third-party apps/services building on these data sources. A 
 consultation we ran last year indicates that a large number of 
 developers/researchers interested in building services/mashups on top of 
 Wikipedia don't have a clue about what data/APIs we make available beside the 
 XML dumps or where to find this data: this is the audience we should build 
 the developer hub for.
 
 (2) the hub should host simple recipes on how to use existing data sources 
 for building applications and list existing libraries for data 
 crunching/manipulation. My initial attempt at listing Wikimedia/Wikipedia 
 apps, mashups and data wrangling libraries is this spreadsheet, contributions 
 are welcome [1]
 
 (3) on top of documenting data sources/APIs we should showcase the best 
 applications that use them and incentivize more developers to play with our 
 data, like Flickr does with its app garden. WMF designer Vibha Bamba created 
 these two mockups [1] [2], loosely inspired by 
 http://selection.datavisualization.ch, for a visual directory that we could 
 initially host on Labs.
 
 Dario
 
 [1] 
 https://docs.google.com/a/wikimedia.org/spreadsheet/ccc?key=0Ams-fyukCIlMdDVrNHQ5RmJtZmNNQ01UbF9qeUV2aGc#gid=0
  
 [2] http://commons.wikimedia.org/wiki/File:Wikipedia_DataViz-01.png
 [3] http://commons.wikimedia.org/wiki/File:Wikipedia_DataViz-02.png 
 



On Oct 22, 2012, at 7:00 PM, Jérémie Roquet arkano...@gmail.com wrote:

 cc-ed xmldatadumps-l
 
 Hi,
 
 2012/10/23 Dario Taraborelli dtarabore...@wikimedia.org:
 2012/10/23 James Forrester ja...@jdforrester.org:
 On 22 October 2012 16:03, Hydriz Wikipedia ad...@alphacorp.tk wrote:
 I have long been wanting to say this, but is it possible for the team 
 behind
 compiling such datasets to put future (and if possible, current) datasets
 into dumps.wikimedia.org so that it is easier for everyone to find stuff 
 and
 not be all over the place? Thanks for that!
 
 Many one-off and regular datasets, from query results to data dumps
 and similar, are now indexed[0] on The Data Hub (formerly CKAN) run by
 the Open Knowledge Foundation for precisely this reason - so that data
 researchers can easily find data about Wikimedia, and see when it's
 updated.
 
 [0] - http://thedatahub.org/en/group/wikimedia
 
 The dumps server was never meant to become a permanent open data repository, 
 but it started being used as an ad-hoc solution to host all sort of datasets 
 published by WMF on top of the actual XML dumps: that's the problem we're 
 trying to fix.
 
 Regardless of where the data is physically hosted, your go-to point to 
 discover WMF datasets from now on is the DataHub. Think of it as a data 
 registry: the registry is  all you need to know in order to find where the 
 data is hosted and to extract the appropriate metadata/documentation.
 
 That's fine for me but I think more communication about this would be
 welcome. I've added a link to meta:Data_dumps¹ and I'll communicate
 about this on the French Wikipedia, but a link on the dumps' page for
 other downloads² would be great.
 
 Most

Re: [Wiki-research-l] wikitweets: view tweets that reference wikipedia in realtime

2012-09-19 Thread Dario Taraborelli
Ed,

that's awesome – do you mind adding an entry on the DataHub?

http://thedatahub.org/group/wikimedia

Dario 

On Sep 19, 2012, at 6:57 PM, Ed Summers e...@pobox.com wrote:

 Emilio, Taha:
 
 I realize this was long enough ago that you may no longer be
 interested but I finally got around to adding an archive function to
 wikitweets [1]. Every time the app collects 1000 tweets that reference
 Wikipedia it dumps them to a file on Internet Archive [2].
 
 One nice side effect of this is that you get a BitTorrent seed/peer
 for free [3], which makes mirroring the data pretty simple...if you
 have a BitTorrent client handy. I blogged a little bit about how it
 the archive function in wikitweets works [4].
 
 Best,
 //Ed
 
 [1] http://wikitweets.herokuapp.com
 [2] http://archive.org/download/wikitweets/wikitweets_archive.torrent
 [3] http://archive.org/download/wikitweets/wikitweets_archive.torrent
 [4] http://inkdroid.org/journal/2012/09/19/archiving-wikitweets/
 
 On Thu, Apr 26, 2012 at 6:28 PM, Taha Yasseri taha.yas...@gmail.com wrote:
 My appreciation too. and the same question, do you also store the records?
 
 bests,
 .t
 
 On Thu, Apr 26, 2012 at 7:14 PM, emijrp emi...@gmail.com wrote:
 
 2012/4/26 Ed Summers e...@pobox.com
 
 This is more on the experimental side of research but I just
 finished a prototype realtime visualization of tweets that reference
 Wikipedia:
 
   http://wikitweets.herokuapp.com/
 
 
 Very cool. Do you archive the tweets or they are discarded?
 
 --
 Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com
 Pre-doctoral student at the University of Cádiz (Spain)
 Projects: AVBOT | StatMediaWiki | WikiEvidens | WikiPapers | WikiTeam
 Personal website: https://sites.google.com/site/emijrp/
 
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 
 
 
 --
 Taha.
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] WikipediaJS

2012-09-10 Thread Dario Taraborelli
OKFN Labs just released a lightweight JS library to pull machine-readable data 
on Wikipedia articles from DBPedia

http://okfnlabs.org/wikipediajs/

Dario
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] AFT5 regarding RJensen question

2012-09-06 Thread Dario Taraborelli
the complete reports on WMF research on AFT5 can be found here:
http://meta.wikimedia.org/wiki/Research:Article_feedback

The tool is currently deployed on a random 10% sample of English Wikipedia 
articles so it's not surprising most readers don't see it very often. We are 
currently collecting about 4K unique feedback messages per day: 
http://toolserver.org/~dartar/fp/

As for the quality of feedback – as judged by community members and readers – 
we have some preliminary usage data coming from the FeedbackPage: 
http://toolserver.org/~dartar/fp/ as well as results based on blind assessment 
by Wikipedians that we ran during the early stages of AFT5 research (see the 
Quality assessment sections in the research reports above).

We will be publishing shortly an update on FeedbackPage data, but as the 
feature is not rolled out on the entire project and not many editors or readers 
know how to find the FeedbackPage (i.e. the only place where comments can be 
filtered, flagged and moderated), these results should not be taken as 
conclusive.

A full roll out of AFT5 on the entire English Wikipedia is scheduled for Q4 
2012.

HTH

Dario

On Sep 6, 2012, at 1:51 PM, Kerry Raymond wrote:

 It might be premature to draw any conclusions about editor response to AFT5,
 given it hasn't been fully rolled-out. I rarely see it as a reader
 (admittedly it's hard to spot on a large article with lots of citations) and
 I don't think I have ever seen it on pages I have edited recently (and I do
 look for the feedback) -- it's difficult to have an editor response to
 something that isn't there.
 
 Kerry
 
 
 -Original Message-
 From: wiki-research-l-boun...@lists.wikimedia.org
 [mailto:wiki-research-l-boun...@lists.wikimedia.org] On Behalf Of ENWP Pine
 Sent: Friday, 7 September 2012 6:06 AM
 To: Research into Wikimedia content and communities
 Subject: [Wiki-research-l] AFT5 regarding RJensen question
 
 RJensen wrote in the War of 1812 email thread:
 Comments: I have not seen any editor make actual use of the Article
 Feedback tool -- are there examples?  Yes Wikipedians are very proud
 of their vast half-billion-person audience.  However they do not ask
 what features are most useful for a high school student or teacher/
 a university student/ etc
 
 This is a very interesting question. What have been the benefits of AFT5? I 
 have seen complaints about spam and suppressible material being written in 
 AFT5. What benefits has it had?
 
 With your permission, RJensen, I'll forward your question and mine to 
 Wikimedia-l for discussion there as well.
 
 Pine 
 
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Social Network Analysis of Wikipedia

2012-09-05 Thread Dario Taraborelli
you should also check out:

Laniado, David, Riccardo Tasso, Y. Volkovich, and Andreas Kaltenbrunner. When 
the Wikipedians talk: network and tree structure of Wikipedia discussion pages. 
In Proceedings of the Fifth International AAAI Conference on Weblogs and Social 
Media (ICWSM '11), 177-184, 2011.  
http://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/viewPDFInterstitial/2764/3301

summarized here: 
https://meta.wikimedia.org/wiki/Research:Newsletter/2011-07-25#The_anatomy_of_a_Wikipedia_talk_page

Dario

On Sep 5, 2012, at 1:05 PM, Brian Keegan wrote:

 There's a good amount of research
 
 Jullien 2012 has an excellent (although by no means exhaustive) lit review of 
 extant Wikipedia research including many network analysis papers:
 http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2053597
 
 Welser, et al. 2011 use network analysis approaches to identify and 
 differentiate users social roles:
 http://www.connectedaction.net/wp-content/uploads/2011/04/Welser.Cosley.plus_.Wiki_.Roles_.pdf
 
 Antin, et al. 2012 use some centrality-like metrics to measure the diversity 
 of editing behavior:
 http://faculty.poly.edu/~onov/Antin_Chehsire_Nov_WPP_CSCW_2012.pdf
 
 Kane 2009 on how network position influences article quality:
 http://www.profkane.com/uploads/7/9/1/3/79137/kane_2009_ocisa.pdf
 
 Kane, et al. 2012 on how membership turnover/retention influences article 
 quality:
 http://www.samransbotham.com/sites/default/files/RansbothamKane_WikiDemotion_2012_MISQ.pdf
 
 shameless self promotion
 Descriptive analysis of Wikipedia's response and networks to the 2011 Tohoku 
 earthquake and tsunami:
 http://www.brianckeegan.com/papers/WikiSym11.pdf
 
 Developing a statistical model of whether Wikipedia collaborations as a 
 bipartite network of editors and authors are more strongly influenced by 
 features of editors or features of articles:
 http://www.brianckeegan.com/papers/CSCW12.pdf
 
 Developing a unipartite network of Wikipedia collaborations as document 
 passing network among editors on a single article:
 http://www.brianckeegan.com/papers/WikiSym12.pdf
 
 
 On Wed, Sep 5, 2012 at 7:43 PM, Jeremy Foote foo...@purdue.edu wrote:
 I am a brand new Master's student at Purdue. For my Social Network Analysis 
 class, I'm thinking about doing a project about whether a Wikipedian's 
 centrality in a network can be used as a predictor of future participation. 
 I've spent the afternoon looking for relevant literature. I found the very 
 interesting 
 
 Validity Issues in the Use of Social Network Analysis with Digital Trace 
 Data by Howison, Wiggins, and Crowston
 and
 Network analysis of collaboration structure in Wikipedia by Brandes et al.
 
 I'm wondering if there are other papers about how to translate Wikipedia into 
 a network structure, or even more specifically relating to node-level 
 centrality measures and participation measures.
 
 Very many thanks,
 Jeremy Foote
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 
 
 
 -- 
 Brian C. Keegan
 Ph.D. Student - Media, Technology,  Society
 School of Communication, Northwestern University
 
 Science of Networks in Communities, Laboratory for Collaborative Technology
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] The Wikimedia Research Newsletter 2(8) is out

2012-08-30 Thread Dario Taraborelli
The August 2012 issue of the Wikimedia Research Newsletter is out:

https://meta.wikimedia.org/wiki/Research:Newsletter/2012-08-27

In this issue:

1 Wikipedia-based graphs visualize influences between thinkers, writers and 
musicians
2 Information retrieval scientists turn their attention to Wikipedia's page 
view logs
3 The limits of amateur NPOV history
4 Three new papers about Wikipedia class assignments
5 Substantitative and non-substantitative contributors show different 
motivation and expertise
6 Is there systemic bias in Wikipedia's coverage of the Tiananmen protests?
7 Low-hanging fruit hypothesis explains Wikipedia's slowed growth?
8 Briefly
9 References

••• 22 publications were covered in this issue •••
Thanks to Piotr Konieczny, Sage Ross, Evan Rosen and Oren Bochman for 
contributing

Dario Taraborelli and Tilman Bayer

--
Wikimedia Research Newsletter
https://meta.wikimedia.org/wiki/Research:Newsletter/

* Follow us on Twitter/Identi.ca: @WikiResearch
* Receive this newsletter by mail: 
https://lists.wikimedia.org/mailman/listinfo/research-newsletter 
* Subscribe to the RSS feed: 
http://blog.wikimedia.org/c/research-2/wikimedia-research-newsletter/feed/___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


  1   2   >