Re: [Wikimedia-l] [Wikimedia Research Showcase] Wednesday September 19, 2018 at 11:30 AM (PDT) 18:30 UTC

2018-09-13 Thread Sarah R
Hi Everyone,

The second abstract was cut short in the first email. Here is the full
version:

Deliberation and resolution on WikipediaA case study of requests for
commentsBy *Amy Zhang, Jane Im*Resolving disputes in a timely manner is
crucial for any online production group. We present an analysis of Requests
for Comments (RfCs), one of the main vehicles on Wikipedia for formally
resolving a policy or content dispute. We collected an exhaustive dataset
of 7,316 RfCs on English Wikipedia over the course of 7 years and conducted
a qualitative and quantitative analysis into what issues affect the RfC
process. Our analysis was informed by 10 interviews with frequent RfC
closers. We found that a major issue affecting the RfC process is the
prevalence of RfCs that could have benefited from formal closure but that
linger indefinitely without one, with factors including participants'
interest and expertise impacting the likelihood of resolution. From these
findings, we developed a model that predicts whether an RfC will go stale
with 75.3% accuracy, a level that is approached as early as one week after
dispute initiation.

On Thu, Sep 13, 2018 at 1:43 PM Sarah R  wrote:

> Hi Everyone,
>
> The next Wikimedia Research Showcase will be live-streamed Wednesday,
> September 19 2018 at 11:30 AM (PDT) 18:30 UTC.
>
> YouTube stream: https://www.youtube.com/watch?v=OY8vZ6wES9o
>
> As usual, you can join the conversation on IRC at #wikimedia-research.
> And, you can watch our past research showcases here.
> <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#Upcoming_Showcase>
>
> Hope to see you there!
>
> This month's presentations is:
>
> The impact of news exposure on collective attention in the United States
> during the 2016 Zika epidemicBy *Michele Tizzoni, André Panisson, Daniela
> Paolotti, Ciro Cattuto*In recent years, many studies have drawn attention
> to the important role of collective awareness and human behaviour during
> epidemic outbreaks. A number of modelling efforts have investigated the
> interaction between the disease transmission dynamics and human behaviour
> change mediated by news coverage and by information spreading in the
> population. Yet, given the scarcity of data on public awareness during an
> epidemic, few studies have relied on empirical data. Here, we use
> fine-grained, geo-referenced data from three online sources - Wikipedia,
> the GDELT Project and the Internet Archive - to quantify population-scale
> information seeking about the 2016 Zika virus epidemic in the U.S.,
> explicitly linking such behavioural signal to epidemiological data.
> Geo-localized Wikipedia pageview data reveal that visiting patterns of
> Zika-related pages in Wikipedia were highly synchronized across the United
> States and largely explained by exposure to national television broadcast.
> Contrary to the assumption of some theoretical models, news volume and
> Wikipedia visiting patterns were not significantly correlated with the
> magnitude or the extent of the epidemic. Attention to Zika, in terms of
> Zika-related Wikipedia pageviews, was high at the beginning of the
> outbreak, when public health agencies raised an international alert and
> triggered media coverage, but subsequently exhibited an activity profile
> that suggests nonlinear dependencies and memory effects in the relationship
> between information seeking, media pressure, and disease dynamics. This
> calls for a new and more general modelling framework to describe the
> interaction between media exposure, public awareness, and disease dynamics
> during epidemic outbreaks.
>
>
> Deliberation and resolution on WikipediaA case study of requests for
> commentsBy *Amy Zhang, Jane Im*Resolving disputes in a timely manner is
> crucial for any online production group. We present an analysis of Requests
> for Comments (RfCs), one of the main vehicles on Wikipedia for formally
> resolving a policy or content dispute. We collected an exhaustive dataset
> of 7,316 RfCs on English Wikipedia over the course of 7 years and conducted
> a qualitative and quantitative analysis into what issues affect the RfC
> process. Our analysis was informed by 10 interviews with frequent RfC
> closers. We found that a major issue affecting the RfC process is the
> prevalence of RfCs that could have benefited from formal closure but that
> linger indefinitely without one, with factors including participants'
> interest and expertise impacting the likelihood of resolution. From these
> findings, we developed a model that predicts whether
>
> --
> Sarah R. Rodlund
> Technical Writer, Developer Advocacy
> <https://meta.wikimedia.org/wiki/Developer_Advocacy>
> srodl...@wikimedia.org
>
>
>
>
>

-- 
Sarah R. Rodlund
Technical Writer, Developer 

[Wikimedia-l] [Wikimedia Research Showcase] Wednesday September 19, 2018 at 11:30 AM (PDT) 18:30 UTC

2018-09-13 Thread Sarah R
Hi Everyone,

The next Wikimedia Research Showcase will be live-streamed Wednesday,
September 19 2018 at 11:30 AM (PDT) 18:30 UTC.

YouTube stream: https://www.youtube.com/watch?v=OY8vZ6wES9o

As usual, you can join the conversation on IRC at #wikimedia-research. And,
you can watch our past research showcases here.
<https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#Upcoming_Showcase>

Hope to see you there!

This month's presentations is:

The impact of news exposure on collective attention in the United States
during the 2016 Zika epidemicBy *Michele Tizzoni, André Panisson, Daniela
Paolotti, Ciro Cattuto*In recent years, many studies have drawn attention
to the important role of collective awareness and human behaviour during
epidemic outbreaks. A number of modelling efforts have investigated the
interaction between the disease transmission dynamics and human behaviour
change mediated by news coverage and by information spreading in the
population. Yet, given the scarcity of data on public awareness during an
epidemic, few studies have relied on empirical data. Here, we use
fine-grained, geo-referenced data from three online sources - Wikipedia,
the GDELT Project and the Internet Archive - to quantify population-scale
information seeking about the 2016 Zika virus epidemic in the U.S.,
explicitly linking such behavioural signal to epidemiological data.
Geo-localized Wikipedia pageview data reveal that visiting patterns of
Zika-related pages in Wikipedia were highly synchronized across the United
States and largely explained by exposure to national television broadcast.
Contrary to the assumption of some theoretical models, news volume and
Wikipedia visiting patterns were not significantly correlated with the
magnitude or the extent of the epidemic. Attention to Zika, in terms of
Zika-related Wikipedia pageviews, was high at the beginning of the
outbreak, when public health agencies raised an international alert and
triggered media coverage, but subsequently exhibited an activity profile
that suggests nonlinear dependencies and memory effects in the relationship
between information seeking, media pressure, and disease dynamics. This
calls for a new and more general modelling framework to describe the
interaction between media exposure, public awareness, and disease dynamics
during epidemic outbreaks.


Deliberation and resolution on WikipediaA case study of requests for
commentsBy *Amy Zhang, Jane Im*Resolving disputes in a timely manner is
crucial for any online production group. We present an analysis of Requests
for Comments (RfCs), one of the main vehicles on Wikipedia for formally
resolving a policy or content dispute. We collected an exhaustive dataset
of 7,316 RfCs on English Wikipedia over the course of 7 years and conducted
a qualitative and quantitative analysis into what issues affect the RfC
process. Our analysis was informed by 10 interviews with frequent RfC
closers. We found that a major issue affecting the RfC process is the
prevalence of RfCs that could have benefited from formal closure but that
linger indefinitely without one, with factors including participants'
interest and expertise impacting the likelihood of resolution. From these
findings, we developed a model that predicts whether

-- 
Sarah R. Rodlund
Technical Writer, Developer Advocacy
<https://meta.wikimedia.org/wiki/Developer_Advocacy>
srodl...@wikimedia.org
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

Re: [Wikimedia-l] Wikimedia Research Showcase August 13 2018 at 11:30 AM (PDT) 18:30 UTC

2018-08-13 Thread Sarah R
Hi All,

Just a reminder this is happening at 11:30 AM (PDT) 18:30 UTC *TODAY.*

Many kindnesses,

Sarah R.

On Fri, Aug 10, 2018 at 3:46 PM Sarah R  wrote:

> Hi Everyone,
>
> The next Wikimedia Research Showcase will be live-streamed Wednesday,
> August 13 2018 at 11:30 AM (PDT) 18:30 UTC.
>
> YouTube stream: https://www.youtube.com/watch?v=OGPMS4YGDMk
>
> As usual, you can join the conversation on IRC at #wikimedia-research.
> And, you can watch our past research showcases here.
> <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#Upcoming_Showcase>
>
> Hope to see you there!
>
> This month's presentations is:
>
> *Quicksilver: Training an ML system to generate draft Wikipedia articles
> and Wikidata entries simultaneously*
>
> John Bohannon and Vedant Dharnidharka, Primer
>
> The automatic generation and updating of Wikipedia articles is usually
> approached as a multi-document summarization task: Given a set of source
> documents containing information about an entity, summarize the entity.
> Purely sequence-to-sequence neural models can pull that off, but getting
> enough data to train them is a challenge. Wikipedia articles and their
> reference documents can be used for training, as was recently done
> <https://arxiv.org/abs/1801.10198> by a team at Google AI. But how do you
> find new source documents for new entities? And besides having humans read
> all of the source documents, how do you fact-check the output? What is
> needed is a self-updating knowledge base that learns jointly with a
> summarization model, keeping track of data provenance. Lucky for us, the
> world’s most comprehensive public encyclopedia is tightly coupled with
> Wikidata, the world’s most comprehensive public knowledge base. We have
> built a system called Quicksilver uses them both.
>
>
>

-- 
Sarah R. Rodlund
Technical Writer, Developer Advocacy
<https://meta.wikimedia.org/wiki/Developer_Advocacy>
srodl...@wikimedia.org


*“I am a jug filled with water both magic and plain; I have only to lean
over, and a stream of beautiful thoughts flows out of me.” *

― Bohumil Hrabal, Too Loud a Solitude
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

Re: [Wikimedia-l] Wikimedia Research Showcase August 13 2018 at 11:30 AM (PDT) 18:30 UTC

2018-08-10 Thread Sarah R
 Hi All,

In my haste, I put the wrong weekday on this email. The showcase will be on
Monday this month, not Wednesday.

Kindly,


On Fri, Aug 10, 2018 at 3:46 PM Sarah R  wrote:

> Hi Everyone,
>
> The next Wikimedia Research Showcase will be live-streamed Wednesday,
> August 13 2018 at 11:30 AM (PDT) 18:30 UTC.
>
> YouTube stream: https://www.youtube.com/watch?v=OGPMS4YGDMk
>
> As usual, you can join the conversation on IRC at #wikimedia-research.
> And, you can watch our past research showcases here.
> <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#Upcoming_Showcase>
>
> Hope to see you there!
>
> This month's presentations is:
>
> *Quicksilver: Training an ML system to generate draft Wikipedia articles
> and Wikidata entries simultaneously*
>
> John Bohannon and Vedant Dharnidharka, Primer
>
> The automatic generation and updating of Wikipedia articles is usually
> approached as a multi-document summarization task: Given a set of source
> documents containing information about an entity, summarize the entity.
> Purely sequence-to-sequence neural models can pull that off, but getting
> enough data to train them is a challenge. Wikipedia articles and their
> reference documents can be used for training, as was recently done
> <https://arxiv.org/abs/1801.10198> by a team at Google AI. But how do you
> find new source documents for new entities? And besides having humans read
> all of the source documents, how do you fact-check the output? What is
> needed is a self-updating knowledge base that learns jointly with a
> summarization model, keeping track of data provenance. Lucky for us, the
> world’s most comprehensive public encyclopedia is tightly coupled with
> Wikidata, the world’s most comprehensive public knowledge base. We have
> built a system called Quicksilver uses them both.
>
>
>

-- 
Sarah R. Rodlund
Technical Writer, Developer Advocacy
<https://meta.wikimedia.org/wiki/Developer_Advocacy>
srodl...@wikimedia.org


*“I am a jug filled with water both magic and plain; I have only to lean
over, and a stream of beautiful thoughts flows out of me.” *

― Bohumil Hrabal, Too Loud a Solitude
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

[Wikimedia-l] Wikimedia Research Showcase August 13 2018 at 11:30 AM (PDT) 18:30 UTC

2018-08-10 Thread Sarah R
Hi Everyone,

The next Wikimedia Research Showcase will be live-streamed Wednesday,
August 13 2018 at 11:30 AM (PDT) 18:30 UTC.

YouTube stream: https://www.youtube.com/watch?v=OGPMS4YGDMk

As usual, you can join the conversation on IRC at #wikimedia-research. And,
you can watch our past research showcases here.


Hope to see you there!

This month's presentations is:

*Quicksilver: Training an ML system to generate draft Wikipedia articles
and Wikidata entries simultaneously*

John Bohannon and Vedant Dharnidharka, Primer

The automatic generation and updating of Wikipedia articles is usually
approached as a multi-document summarization task: Given a set of source
documents containing information about an entity, summarize the entity.
Purely sequence-to-sequence neural models can pull that off, but getting
enough data to train them is a challenge. Wikipedia articles and their
reference documents can be used for training, as was recently done
 by a team at Google AI. But how do you
find new source documents for new entities? And besides having humans read
all of the source documents, how do you fact-check the output? What is
needed is a self-updating knowledge base that learns jointly with a
summarization model, keeping track of data provenance. Lucky for us, the
world’s most comprehensive public encyclopedia is tightly coupled with
Wikidata, the world’s most comprehensive public knowledge base. We have
built a system called Quicksilver uses them both.
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Wikimedia Research Showcase July 11, 2018 (11:30 AM PDT| 18:30 UTC)

2018-07-11 Thread Sarah R
Hi Folks,

Just a reminder this is happening today!

Hope to see you there!

On Fri, Jul 6, 2018 at 10:30 AM Sarah R  wrote:

> Hi Everyone,
>
> The next Wikimedia Research Showcase will be live-streamed Wednesday,
> July 11, 2018 at 11:30 AM (PDT) 18:30 UTC.
>
> YouTube stream: https://www.youtube.com/watch?v=uK7AvNKq0sg
>
> As usual, you can join the conversation on IRC at #wikimedia-research.
> And, you can watch our past research showcases here.
> <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#Upcoming_Showcase>
>
> Hope to see you there!
>
> This month's presentations:
>
> Mind the (Language) Gap: Neural Generation of Multilingual Wikipedia
> Summaries from Wikidata for ArticlePlaceholdersBy *Lucie-Aimée Kaffee*While
> Wikipedia exists in 287 languages, its content is unevenly distributed
> among them. It is therefore of the utmost social and cultural interests to
> address languages for which native speakers have only access to an
> impoverished Wikipedia. In this work, we investigate the generation of
> summaries for Wikipedia articles in underserved languages, given structured
> data as an input.
> In order to address the information bias towards widely spoken languages,
> we focus on an important support for such summaries: ArticlePlaceholders,
> which are dynamically generated content pages in underserved Wikipedia
> versions. They enable native speakers to access existing information in
> Wikidata, a structured Knowledge Base (KB). Our system provides a
> generative neural network architecture, which processes the triples of the
> KB as they are dynamically provided by the ArticlePlaceholder, and generate
> a comprehensible textual summary. This data-driven approach is tested with
> the goal of understanding how well it matches the communities' needs on two
> underserved languages on the Web: Arabic, a language with a big community
> with disproportionate access to knowledge online, and Esperanto.
> With the help of the Arabic and Esperanto Wikipedians, we conduct an
> extended evaluation which exhibits not only the quality of the generated
> text but also the applicability of our end-system to any underserved
> Wikipedia version. Token-level change tracking: data, tools and insights
> By *Fabian Flöck*This talk first gives an overview of the WikiWho
> infrastructure, which provides tracking of changes to single tokens
> (~words) in articles of different Wikipedia language versions. It exposes
> APIs for accessing this data in near-real time, and is complemented by a
> published static dataset. Several insights are presented regarding
> provenance, partial reverts, token-level conflict and other metrics that
> only become available with such data. Lastly, the talk will cover several
> tools and scripts that are already using the API and will discuss their
> application scenarios, such as investigation of authorship, conflicted
> content and editor productivity.
>


-- 
Sarah R. Rodlund
Technical Writer, Developer Advocacy
<https://meta.wikimedia.org/wiki/Developer_Advocacy>
srodl...@wikimedia.org
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

[Wikimedia-l] Wikimedia Research Showcase July 11, 2018 (11:30 AM PDT| 18:30 UTC)

2018-07-06 Thread Sarah R
Hi Everyone,

The next Wikimedia Research Showcase will be live-streamed Wednesday, July
11, 2018 at 11:30 AM (PDT) 18:30 UTC.

YouTube stream: https://www.youtube.com/watch?v=uK7AvNKq0sg

As usual, you can join the conversation on IRC at #wikimedia-research. And,
you can watch our past research showcases here.


Hope to see you there!

This month's presentations:

Mind the (Language) Gap: Neural Generation of Multilingual Wikipedia
Summaries from Wikidata for ArticlePlaceholdersBy *Lucie-Aimée Kaffee*While
Wikipedia exists in 287 languages, its content is unevenly distributed
among them. It is therefore of the utmost social and cultural interests to
address languages for which native speakers have only access to an
impoverished Wikipedia. In this work, we investigate the generation of
summaries for Wikipedia articles in underserved languages, given structured
data as an input.
In order to address the information bias towards widely spoken languages,
we focus on an important support for such summaries: ArticlePlaceholders,
which are dynamically generated content pages in underserved Wikipedia
versions. They enable native speakers to access existing information in
Wikidata, a structured Knowledge Base (KB). Our system provides a
generative neural network architecture, which processes the triples of the
KB as they are dynamically provided by the ArticlePlaceholder, and generate
a comprehensible textual summary. This data-driven approach is tested with
the goal of understanding how well it matches the communities' needs on two
underserved languages on the Web: Arabic, a language with a big community
with disproportionate access to knowledge online, and Esperanto.
With the help of the Arabic and Esperanto Wikipedians, we conduct an
extended evaluation which exhibits not only the quality of the generated
text but also the applicability of our end-system to any underserved
Wikipedia version. Token-level change tracking: data, tools and
insightsBy *Fabian
Flöck*This talk first gives an overview of the WikiWho infrastructure,
which provides tracking of changes to single tokens (~words) in articles of
different Wikipedia language versions. It exposes APIs for accessing this
data in near-real time, and is complemented by a published static dataset.
Several insights are presented regarding provenance, partial reverts,
token-level conflict and other metrics that only become available with such
data. Lastly, the talk will cover several tools and scripts that are
already using the API and will discuss their application scenarios, such as
investigation of authorship, conflicted content and editor productivity.
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


[Wikimedia-l] Research Showcase May 8, 2018 (11:30 AM PDT| 18:30 UTC)

2018-05-07 Thread Sarah R
Hi Everyone,

The next Research Showcase will be live-streamed this Tuesday, May 8,
2018 at 11:30 AM (PDT), 18:30 (UTC). (Please note this meeting is on
Tuesday this month).

YouTube stream: https://www.youtube.com/watch?v=t7cHxlGgEt4

As usual, you can join the conversation on IRC at #wikimedia-research. And,
you can watch our past research showcases here.
<https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#Upcoming_Showcase>

Case studies in the appropriation of ORESBy *Aaron Halfaker*ORES is an
open, transparent, and auditable machine prediction platform for
Wikipedians to help them do their work. It's currently used in 33 different
Wikimedia projects to measure the quality of content, detect vandalism,
recommend changes to articles, and to identify good faith newcomers. The
primary way that Wikipedians use ORES' predictions is through the tools
developed by volunteers. These javascript gadgets, MediaWiki extensions,
and web-based tools make up a complex ecosystem of Wikipedian processes --
encoded into software. In this presentation, Aaron will walk through a
three key tools that Wikipedians have developed that make use of ORES, and
he'll discuss how these novel process support technologies and the
discussions around them have prompted Wikipedians to reflect on their work
processes.


Exploring Wikimedia Donation PatternsBy *Gary Hsieh*Every year, Wikimedia
Foundation relies on fundraising campaigns to help maintain the services it
provides to millions of people worldwide. However, despite a large number
of individuals who donate through these campaigns, these donors represent
only a small percentage of Wikimedia users. In this work, we seek to
advance our understanding of donors and their donation behaviors. Our
findings offer insights to improve fundraising campaigns and to limit the
burden of these campaigns on Wikipedia visitors.

Kindly,

Sarah R. Rodlund
Senior Project Coordinator-Product & Technology, Wikimedia Foundation
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

Re: [Wikimedia-l] Research Showcase April 18, 2018 (11:30 AM PDT| 18:30 UTC)

2018-04-18 Thread Sarah R
Hi Everyone,

Just a reminder that the Research Showcase will begin in a half hour!

Kindly,

Sarah R.



On Thu, Apr 12, 2018 at 7:30 PM, Sarah R <srodl...@wikimedia.org> wrote:

> Hi All,
>
> A quick correction.* "*The Critical Relationship of Volunteer Created
> Wikipedia Content to Large-Scale Online Communities" will be presented by 
> *Nicholas
> Vincent.*
>
> Kind regards,
>
> Sarah R.
>
> On Thu, Apr 12, 2018 at 6:47 PM, Sarah R <srodl...@wikimedia.org> wrote:
>
>> Hi Everyone,
>>
>> The next Research Showcase will be live-streamed this Wednesday, April
>> 18, 2018 at 11:30 AM (PDT) 18:30 UTC.
>>
>> YouTube stream: https://www.youtube.com/watch?v=Z1pa-pr6xis
>>
>> As usual, you can join the conversation on IRC at #wikimedia-research.
>> And, you can watch our past research showcases here.
>> <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#Upcoming_Showcase>
>>
>> The Critical Relationship of Volunteer Created Wikipedia Content to
>> Large-Scale Online CommunitiesBy *Nate TeBlunthuis*The extensive
>> Wikipedia literature has largely considered Wikipedia in isolation, outside
>> of the context of its broader Internet ecosystem. Very recent research has
>> demonstrated the significance of this limitation, identifying critical
>> relationships between Google and Wikipedia that are highly relevant to many
>> areas of Wikipedia-based research and practice. In this talk, I will
>> present a study which extends this recent research beyond search engines to
>> examine Wikipedia’s relationships with large-scale online communities,
>> Stack Overflow and Reddit in particular. I will discuss evidence of
>> consequential, albeit unidirectional relationships. Wikipedia provides
>> substantial value to both communities, with Wikipedia content increasing
>> visitation, engagement, and revenue, but we find little evidence that these
>> websites contribute to Wikipedia in return. Overall, these findings
>> highlight important connections between Wikipedia and its broader ecosystem
>> that should be considered by researchers studying Wikipedia. Overall, this
>> talk will emphasize the key role that volunteer-created Wikipedia content
>> plays in improving other websites, even contributing to revenue generation.
>>
>>
>> The Rise and Decline of an Open Collaboration System, a Closer LookBy *Nate
>> TeBlunthuis*Do patterns of growth and stabilization found in large peer
>> production systems such as Wikipedia occur in other communities? This study
>> assesses the generalizability of Halfaker etal.’s influential 2013 paper on
>> “The Rise and Decline of an Open Collaboration System.” We replicate its
>> tests of several theories related to newcomer retention and norm
>> entrenchment using a dataset of hundreds of active peer production wikis
>> from Wikia. We reproduce the subset of the findings from Halfaker and
>> colleagues that we are able to test, comparing both the estimated signs and
>> magnitudes of our models. Our results support the external validity of
>> Halfaker et al.’s claims that quality control systems may limit the growth
>> of peer production communities by deterring new contributors and that norms
>> tend to become entrenched over time.
>>
>> Kindest regards,
>>
>> Sarah R. Rodlund
>> Senior Project Coordinator-Product & Technology, Wikimedia Foundation |
>> Hic sunt leones
>> srodl...@wikimedia.org
>>
>>
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

Re: [Wikimedia-l] Research Showcase April 18, 2018 (11:30 AM PDT| 18:30 UTC)

2018-04-12 Thread Sarah R
Hi All,

A quick correction.* "*The Critical Relationship of Volunteer Created
Wikipedia Content to Large-Scale Online Communities" will be presented
by *Nicholas
Vincent.*

Kind regards,

Sarah R.

On Thu, Apr 12, 2018 at 6:47 PM, Sarah R <srodl...@wikimedia.org> wrote:

> Hi Everyone,
>
> The next Research Showcase will be live-streamed this Wednesday, April
> 18, 2018 at 11:30 AM (PDT) 18:30 UTC.
>
> YouTube stream: https://www.youtube.com/watch?v=Z1pa-pr6xis
>
> As usual, you can join the conversation on IRC at #wikimedia-research.
> And, you can watch our past research showcases here.
> <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#Upcoming_Showcase>
>
> The Critical Relationship of Volunteer Created Wikipedia Content to
> Large-Scale Online CommunitiesBy *Nate TeBlunthuis*The extensive
> Wikipedia literature has largely considered Wikipedia in isolation, outside
> of the context of its broader Internet ecosystem. Very recent research has
> demonstrated the significance of this limitation, identifying critical
> relationships between Google and Wikipedia that are highly relevant to many
> areas of Wikipedia-based research and practice. In this talk, I will
> present a study which extends this recent research beyond search engines to
> examine Wikipedia’s relationships with large-scale online communities,
> Stack Overflow and Reddit in particular. I will discuss evidence of
> consequential, albeit unidirectional relationships. Wikipedia provides
> substantial value to both communities, with Wikipedia content increasing
> visitation, engagement, and revenue, but we find little evidence that these
> websites contribute to Wikipedia in return. Overall, these findings
> highlight important connections between Wikipedia and its broader ecosystem
> that should be considered by researchers studying Wikipedia. Overall, this
> talk will emphasize the key role that volunteer-created Wikipedia content
> plays in improving other websites, even contributing to revenue generation.
>
>
> The Rise and Decline of an Open Collaboration System, a Closer LookBy *Nate
> TeBlunthuis*Do patterns of growth and stabilization found in large peer
> production systems such as Wikipedia occur in other communities? This study
> assesses the generalizability of Halfaker etal.’s influential 2013 paper on
> “The Rise and Decline of an Open Collaboration System.” We replicate its
> tests of several theories related to newcomer retention and norm
> entrenchment using a dataset of hundreds of active peer production wikis
> from Wikia. We reproduce the subset of the findings from Halfaker and
> colleagues that we are able to test, comparing both the estimated signs and
> magnitudes of our models. Our results support the external validity of
> Halfaker et al.’s claims that quality control systems may limit the growth
> of peer production communities by deterring new contributors and that norms
> tend to become entrenched over time.
>
> Kindest regards,
>
> Sarah R. Rodlund
> Senior Project Coordinator-Product & Technology, Wikimedia Foundation |
> Hic sunt leones
> srodl...@wikimedia.org
>
>


-- 
Sarah R. Rodlund
Senior Project Coordinator-Product & Technology, Wikimedia Foundation | Hic
sunt leones
srodl...@wikimedia.org


*“Our lives begin to end the day we become silent about things that
matter.”  ~ Martin Luther King Jr
<https://www.goodreads.com/author/show/23924.Martin_Luther_King_Jr_>*
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

[Wikimedia-l] Research Showcase April 18, 2018 (11:30 AM PDT| 18:30 UTC)

2018-04-12 Thread Sarah R
Hi Everyone,

The next Research Showcase will be live-streamed this Wednesday, April 18,
2018 at 11:30 AM (PDT) 18:30 UTC.

YouTube stream: https://www.youtube.com/watch?v=Z1pa-pr6xis

As usual, you can join the conversation on IRC at #wikimedia-research. And,
you can watch our past research showcases here.
<https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#Upcoming_Showcase>

The Critical Relationship of Volunteer Created Wikipedia Content to
Large-Scale Online CommunitiesBy *Nate TeBlunthuis*The extensive Wikipedia
literature has largely considered Wikipedia in isolation, outside of the
context of its broader Internet ecosystem. Very recent research has
demonstrated the significance of this limitation, identifying critical
relationships between Google and Wikipedia that are highly relevant to many
areas of Wikipedia-based research and practice. In this talk, I will
present a study which extends this recent research beyond search engines to
examine Wikipedia’s relationships with large-scale online communities,
Stack Overflow and Reddit in particular. I will discuss evidence of
consequential, albeit unidirectional relationships. Wikipedia provides
substantial value to both communities, with Wikipedia content increasing
visitation, engagement, and revenue, but we find little evidence that these
websites contribute to Wikipedia in return. Overall, these findings
highlight important connections between Wikipedia and its broader ecosystem
that should be considered by researchers studying Wikipedia. Overall, this
talk will emphasize the key role that volunteer-created Wikipedia content
plays in improving other websites, even contributing to revenue generation.


The Rise and Decline of an Open Collaboration System, a Closer LookBy *Nate
TeBlunthuis*Do patterns of growth and stabilization found in large peer
production systems such as Wikipedia occur in other communities? This study
assesses the generalizability of Halfaker etal.’s influential 2013 paper on
“The Rise and Decline of an Open Collaboration System.” We replicate its
tests of several theories related to newcomer retention and norm
entrenchment using a dataset of hundreds of active peer production wikis
from Wikia. We reproduce the subset of the findings from Halfaker and
colleagues that we are able to test, comparing both the estimated signs and
magnitudes of our models. Our results support the external validity of
Halfaker et al.’s claims that quality control systems may limit the growth
of peer production communities by deterring new contributors and that norms
tend to become entrenched over time.

Kindest regards,

Sarah R. Rodlund
Senior Project Coordinator-Product & Technology, Wikimedia Foundation | Hic
sunt leones
srodl...@wikimedia.org
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

Re: [Wikimedia-l] Research Showcase March 21, 2018 (11:30 AM PDT | 18:30 UTC)

2018-03-21 Thread Sarah R
Hi Everyone,

Just a reminder -- this is beginning in a half hour. Hope to see you there!

On Mon, Mar 19, 2018 at 1:54 PM, Sarah R <srodl...@wikimedia.org> wrote:

> Hi Everyone,
>
> The next Research Showcase will be live-streamed this Wednesday, March 21,
> 2018 at 11:30 AM (PDT) 18:30 UTC.
>
> YouTube stream:  https://www.youtube.com/watch?v=ACevHs0sMMw
>
> As usual, you can join the conversation on IRC at #wikimedia-research.
> And, you can watch our past research showcases here
> <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#March_2018>.
>
>
> Over the past years, the Research team at Wikimedia Foundation and some of
> our formal collaborators have been focused on doing research and building
> technologies that can help editors across Wikimedia languages find tasks
> for contributions. While the early effort was heavily focused on article
> recommendation for creation (horizontal expansion), in 2016 we started a
> new direction of research with a focus on vertical expansion of Wikipedia
> articles. The two talks in the March 2018 Research Showcase will share some
> of what we have learned from this research. More specifically, we will talk
> about Wikipedia category network as a great signal for creating
> templates/structures for Wikipedia articles as well as ongoing research to
> learn what content (sections) are missing from Wikipedia across its many
> languages. The two corresponding abstracts with more details are below.
> Join us! :)
>
>
> Using Wikipedia categories for research: opportunities, challenges, and
> solutionsBy *Tiziano Piccardi, EPFL*The category network in Wikipedia is
> used by editors as a way to label articles and organize them in a
> hierarchical structure. This manually created and curated network of 1.6
> million nodes in English Wikipedia generated by arranging the categories in
> a child-parent relation (i.e., Scientists-People, Cities-Human Settlement)
> allows researchers to infer valuable relations between concepts. A clean
> structure in this format would be a valuable resource for a variety of
> tools and application including automatic reasoning tools. Unfortunately,
> Wikipedia category network contains some "noise" since in many cases the
> association as subcategory does not define an is-a relation (Scientists
> is-a People vs. Billionaires‎ is-a Wealth). Inspired to develop a model for
> recommending sections to be added to the already existing Wikipedia
> articles, we developed a method to clean this network and to keep only the
> categories that have a high chance to be associated with their children by
> an is-a relation. The strategy is based on the concept of "pure"
> categories, and the algorithm uses the types of the attached articles to
> determine how homogenous the category is. The approach does not rely on any
> linguistic feature and therefore is suitable for all Wikipedia languages.
> In this talk, we will discuss the high-level overview of the algorithm and
> some of the possible applications for the generated network beyond article
> section recommendations.
>
>
> Beyond Automatic Translation: Aligning Wikipedia sections across multiple
> languagesBy *Diego Saez-Trumper*Sections are the building blocks of
> Wikipedia articles. For editors, they can be used as an entry point for
> creating and expanding articles. For readers, they enhance readability of
> Wikipedia content. In this talk, we present an ongoing research to align
> article sections across Wikipedia languages. We show how the available
> technology for automatic translations are not good enough for translating
> section titles. We then show a complementary approach for section
> alignment, using Wikidata and cross-lingual word embeddings. We will
> present some of the use-cases of a methodology for aligning sections across
> languages, including improved section recommendation, especially in medium
> to smaller size languages where the language itself may not contain enough
> signal about the structure of the articles and signals can be inferred from
> other larger Wikipedia languages.
>
> Sarah R. Rodlund
> Senior Project Coordinator-Product & Technology, Wikimedia Foundation
> srodl...@wikimedia.org
>
>
>
>


-- 
Sarah R. Rodlund
Senior Project Coordinator-Product & Technology, Wikimedia Foundation | Hic
sunt leones
srodl...@wikimedia.org


*“Our lives begin to end the day we become silent about things that
matter.”  ~ Martin Luther King Jr
<https://www.goodreads.com/author/show/23924.Martin_Luther_King_Jr_>*
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

[Wikimedia-l] Research Showcase March 21, 2018 (11:30 AM PDT | 18:30 UTC)

2018-03-19 Thread Sarah R
Hi Everyone,

The next Research Showcase will be live-streamed this Wednesday, March 21,
2018 at 11:30 AM (PDT) 18:30 UTC.

YouTube stream:  https://www.youtube.com/watch?v=ACevHs0sMMw

As usual, you can join the conversation on IRC at #wikimedia-research. And,
you can watch our past research showcases here
<https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#March_2018>.


Over the past years, the Research team at Wikimedia Foundation and some of
our formal collaborators have been focused on doing research and building
technologies that can help editors across Wikimedia languages find tasks
for contributions. While the early effort was heavily focused on article
recommendation for creation (horizontal expansion), in 2016 we started a
new direction of research with a focus on vertical expansion of Wikipedia
articles. The two talks in the March 2018 Research Showcase will share some
of what we have learned from this research. More specifically, we will talk
about Wikipedia category network as a great signal for creating
templates/structures for Wikipedia articles as well as ongoing research to
learn what content (sections) are missing from Wikipedia across its many
languages. The two corresponding abstracts with more details are below.
Join us! :)


Using Wikipedia categories for research: opportunities, challenges, and
solutionsBy *Tiziano Piccardi, EPFL*The category network in Wikipedia is
used by editors as a way to label articles and organize them in a
hierarchical structure. This manually created and curated network of 1.6
million nodes in English Wikipedia generated by arranging the categories in
a child-parent relation (i.e., Scientists-People, Cities-Human Settlement)
allows researchers to infer valuable relations between concepts. A clean
structure in this format would be a valuable resource for a variety of
tools and application including automatic reasoning tools. Unfortunately,
Wikipedia category network contains some "noise" since in many cases the
association as subcategory does not define an is-a relation (Scientists
is-a People vs. Billionaires‎ is-a Wealth). Inspired to develop a model for
recommending sections to be added to the already existing Wikipedia
articles, we developed a method to clean this network and to keep only the
categories that have a high chance to be associated with their children by
an is-a relation. The strategy is based on the concept of "pure"
categories, and the algorithm uses the types of the attached articles to
determine how homogenous the category is. The approach does not rely on any
linguistic feature and therefore is suitable for all Wikipedia languages.
In this talk, we will discuss the high-level overview of the algorithm and
some of the possible applications for the generated network beyond article
section recommendations.


Beyond Automatic Translation: Aligning Wikipedia sections across multiple
languagesBy *Diego Saez-Trumper*Sections are the building blocks of
Wikipedia articles. For editors, they can be used as an entry point for
creating and expanding articles. For readers, they enhance readability of
Wikipedia content. In this talk, we present an ongoing research to align
article sections across Wikipedia languages. We show how the available
technology for automatic translations are not good enough for translating
section titles. We then show a complementary approach for section
alignment, using Wikidata and cross-lingual word embeddings. We will
present some of the use-cases of a methodology for aligning sections across
languages, including improved section recommendation, especially in medium
to smaller size languages where the language itself may not contain enough
signal about the structure of the articles and signals can be inferred from
other larger Wikipedia languages.

Sarah R. Rodlund
Senior Project Coordinator-Product & Technology, Wikimedia Foundation
srodl...@wikimedia.org
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

Re: [Wikimedia-l] Research Showcase Wednesday, February 21, 2018 [External]

2018-02-21 Thread Sarah R
Hi Everyone!

Just a reminder that this month's research showcase will happen today at
11:30 AM (PST) 19:30 (UTC)!

Hope to see you there!

On Thu, Feb 15, 2018 at 10:58 AM, Sarah R <srodl...@wikimedia.org> wrote:

> Hi Everyone,
>
> Quick correction.
>
> The next Research Showcase will be live-streamed this Wednesday, February
> 21, 2018 at 11:30 AM (PST) *19:30 (UTC).*
>
> Kindly,
>
> Sarah R.
>
> On Thu, Feb 15, 2018 at 10:38 AM, Sarah R <srodl...@wikimedia.org> wrote:
>
>> Hi Everyone,
>>
>> The next Research Showcase will be live-streamed this Wednesday, February
>> 21, 2018 at 11:30 AM (PST) 18:30 UTC.
>>
>> YouTube stream: https://www.youtube.com/watch?v=fpmRWCE7F_I
>>
>> As usual, you can join the conversation on IRC at #wikimedia-research.
>> And, you can watch our past research showcases here
>> <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase>.
>>
>> This month's presentation:
>>
>> *Visual enrichment of collaborative knowledge bases*
>>
>> By Miriam Redi, Wikimedia Foundation
>>
>> Images allow us to explain, enrich and complement knowledge without
>> language barriers [1]. They can help illustrate the content of an item in a
>> language-agnostic way to external data consumers. Images can be extremely
>> helpful in multilingual collaborative knowledge bases such as Wikidata.
>>
>> However, a large proportion of Wikidata items lack images. More than 3.6M
>> Wikidata items are about humans (Q5), but only 17% of them have an image
>> associated with them. Only 2.2M of 40 Million Wikidata items have an image.
>> A wider presence of images in such a rich, cross-lingual repository could
>> enable a more complete representation of human knowledge.
>>
>> In this talk, we will discuss challenges and opportunities faced when
>> using machine learning and computer vision tools for the visual enrichment
>> of collaborative knowledge bases. We will share research to help Wikidata
>> contributors make Wikidata more “visual” by recommending high-quality
>> Commons images to Wikidata items. We will show the first results on
>> free-licence image quality scoring and recommendation and discuss future
>> work in this direction.
>>
>> [1] Van Hook, Steven R. "Modes and models for transcending cultural
>> differences in international classrooms." Journal of Research in
>> International Education 10.1 (2011): 5-27. http://journals.sagepub.com/do
>> i/abs/10.1177/1475240910395788
>>
>> *Backlogs—backlogs everywhere: Using machine classification to clean up
>> the new page backlog*
>>
>> By Aaron Halfaker, Wikimedia Foundation
>>
>> If there's one insight that I've had about the functioning of Wikipedia
>> and other wiki-based online communities, it's that eventually self-directed
>> work breaks down and some form of organization becomes important for task
>> routing.  In Wikipedia specifically, the notion of "backlogs" has become
>> dominant.  There's backlogs of articles to create, articles to clean up,
>> articles to assess, new editor contributions to review, manual of style
>> rules to apply, etc.  To a community of people working on a backlog, the
>> state of that backlog has deep effects on their emotional well being.  A
>> backlog that only grows is frustrating and exhausting.
>>
>> Backlogs aren't inevitable though and there are many shapes that backlogs
>> can take.  In my presentation, I'll tell a story about where English
>> Wikipedia editors defined a process and set of roles that formed a backlog
>> around new page creations.  I'll make the argument that this formalization
>> of quality control practices has created a choke point and that
>> alternatives exist. Finally I'll present a vision for such an alternative
>> using models that we have developed for ORES, the open machine prediction
>> service my team maintains.
>>
>> --
>> Sarah R. Rodlund
>> Senior Project Coordinator-Product & Technology, Wikimedia Foundation
>> srodl...@wikimedia.org
>>
>>
>
>
> --
> Sarah R. Rodlund
> Senior Project Coordinator-Product & Technology, Wikimedia Foundation |
> Hic sunt leones
> srodl...@wikimedia.org
>
>
> *“Our lives begin to end the day we become silent about things that
> matter.”  ~ Martin Luther King Jr
> <https://www.goodreads.com/author/show/23924.Martin_Luther_King_Jr_>*
>



-- 
Sarah R. Rodlund
Senior Project Coordinator-Product & Technology, Wikimedia Foundation | Hic
sunt leones
srodl...@wikimedia.org


*“Our lives begin to end the day we become silent about things that
matter.”  ~ Martin Luther King Jr
<https://www.goodreads.com/author/show/23924.Martin_Luther_King_Jr_>*
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

Re: [Wikimedia-l] Research Showcase Wednesday, February 21, 2018 [External]

2018-02-15 Thread Sarah R
Hi Everyone,

Quick correction.

The next Research Showcase will be live-streamed this Wednesday, February
21, 2018 at 11:30 AM (PST) *19:30 (UTC).*

Kindly,

Sarah R.

On Thu, Feb 15, 2018 at 10:38 AM, Sarah R <srodl...@wikimedia.org> wrote:

> Hi Everyone,
>
> The next Research Showcase will be live-streamed this Wednesday, February
> 21, 2018 at 11:30 AM (PST) 18:30 UTC.
>
> YouTube stream: https://www.youtube.com/watch?v=fpmRWCE7F_I
>
> As usual, you can join the conversation on IRC at #wikimedia-research.
> And, you can watch our past research showcases here
> <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase>.
>
> This month's presentation:
>
> *Visual enrichment of collaborative knowledge bases*
>
> By Miriam Redi, Wikimedia Foundation
>
> Images allow us to explain, enrich and complement knowledge without
> language barriers [1]. They can help illustrate the content of an item in a
> language-agnostic way to external data consumers. Images can be extremely
> helpful in multilingual collaborative knowledge bases such as Wikidata.
>
> However, a large proportion of Wikidata items lack images. More than 3.6M
> Wikidata items are about humans (Q5), but only 17% of them have an image
> associated with them. Only 2.2M of 40 Million Wikidata items have an image.
> A wider presence of images in such a rich, cross-lingual repository could
> enable a more complete representation of human knowledge.
>
> In this talk, we will discuss challenges and opportunities faced when
> using machine learning and computer vision tools for the visual enrichment
> of collaborative knowledge bases. We will share research to help Wikidata
> contributors make Wikidata more “visual” by recommending high-quality
> Commons images to Wikidata items. We will show the first results on
> free-licence image quality scoring and recommendation and discuss future
> work in this direction.
>
> [1] Van Hook, Steven R. "Modes and models for transcending cultural
> differences in international classrooms." Journal of Research in
> International Education 10.1 (2011): 5-27. http://journals.sagepub.com/
> doi/abs/10.1177/1475240910395788
>
> *Backlogs—backlogs everywhere: Using machine classification to clean up
> the new page backlog*
>
> By Aaron Halfaker, Wikimedia Foundation
>
> If there's one insight that I've had about the functioning of Wikipedia
> and other wiki-based online communities, it's that eventually self-directed
> work breaks down and some form of organization becomes important for task
> routing.  In Wikipedia specifically, the notion of "backlogs" has become
> dominant.  There's backlogs of articles to create, articles to clean up,
> articles to assess, new editor contributions to review, manual of style
> rules to apply, etc.  To a community of people working on a backlog, the
> state of that backlog has deep effects on their emotional well being.  A
> backlog that only grows is frustrating and exhausting.
>
> Backlogs aren't inevitable though and there are many shapes that backlogs
> can take.  In my presentation, I'll tell a story about where English
> Wikipedia editors defined a process and set of roles that formed a backlog
> around new page creations.  I'll make the argument that this formalization
> of quality control practices has created a choke point and that
> alternatives exist. Finally I'll present a vision for such an alternative
> using models that we have developed for ORES, the open machine prediction
> service my team maintains.
>
> --
> Sarah R. Rodlund
> Senior Project Coordinator-Product & Technology, Wikimedia Foundation
> srodl...@wikimedia.org
>
>


-- 
Sarah R. Rodlund
Senior Project Coordinator-Product & Technology, Wikimedia Foundation | Hic
sunt leones
srodl...@wikimedia.org


*“Our lives begin to end the day we become silent about things that
matter.”  ~ Martin Luther King Jr
<https://www.goodreads.com/author/show/23924.Martin_Luther_King_Jr_>*
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

[Wikimedia-l] Research Showcase Wednesday, February 21, 2018 [External]

2018-02-15 Thread Sarah R
Hi Everyone,

The next Research Showcase will be live-streamed this Wednesday, February
21, 2018 at 11:30 AM (PST) 18:30 UTC.

YouTube stream: https://www.youtube.com/watch?v=fpmRWCE7F_I

As usual, you can join the conversation on IRC at #wikimedia-research. And,
you can watch our past research showcases here
<https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase>.

This month's presentation:

*Visual enrichment of collaborative knowledge bases*

By Miriam Redi, Wikimedia Foundation

Images allow us to explain, enrich and complement knowledge without
language barriers [1]. They can help illustrate the content of an item in a
language-agnostic way to external data consumers. Images can be extremely
helpful in multilingual collaborative knowledge bases such as Wikidata.

However, a large proportion of Wikidata items lack images. More than 3.6M
Wikidata items are about humans (Q5), but only 17% of them have an image
associated with them. Only 2.2M of 40 Million Wikidata items have an image.
A wider presence of images in such a rich, cross-lingual repository could
enable a more complete representation of human knowledge.

In this talk, we will discuss challenges and opportunities faced when using
machine learning and computer vision tools for the visual enrichment of
collaborative knowledge bases. We will share research to help Wikidata
contributors make Wikidata more “visual” by recommending high-quality
Commons images to Wikidata items. We will show the first results on
free-licence image quality scoring and recommendation and discuss future
work in this direction.

[1] Van Hook, Steven R. "Modes and models for transcending cultural
differences in international classrooms." Journal of Research in
International Education 10.1 (2011): 5-27.
http://journals.sagepub.com/doi/abs/10.1177/1475240910395788

*Backlogs—backlogs everywhere: Using machine classification to clean up the
new page backlog*

By Aaron Halfaker, Wikimedia Foundation

If there's one insight that I've had about the functioning of Wikipedia and
other wiki-based online communities, it's that eventually self-directed
work breaks down and some form of organization becomes important for task
routing.  In Wikipedia specifically, the notion of "backlogs" has become
dominant.  There's backlogs of articles to create, articles to clean up,
articles to assess, new editor contributions to review, manual of style
rules to apply, etc.  To a community of people working on a backlog, the
state of that backlog has deep effects on their emotional well being.  A
backlog that only grows is frustrating and exhausting.

Backlogs aren't inevitable though and there are many shapes that backlogs
can take.  In my presentation, I'll tell a story about where English
Wikipedia editors defined a process and set of roles that formed a backlog
around new page creations.  I'll make the argument that this formalization
of quality control practices has created a choke point and that
alternatives exist. Finally I'll present a vision for such an alternative
using models that we have developed for ORES, the open machine prediction
service my team maintains.

-- 
Sarah R. Rodlund
Senior Project Coordinator-Product & Technology, Wikimedia Foundation
srodl...@wikimedia.org
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

Re: [Wikimedia-l] Research Showcase Wednesday, November 15, 2017 at 11:30 AM (PST) 18:30 UTC

2017-11-15 Thread Sarah R
Hi Everyone,

Just a reminder that this will start at 11:30 AM (Pacific), 18:30 UTC.

Kindly,

Sarah R.

On Thu, Nov 9, 2017 at 3:34 PM, Sarah R <srodl...@wikimedia.org> wrote:

> Hi Everyone,
>
> The next Research Showcase will be live-streamed this Wednesday, November
> 15, 2017 at 11:30 AM (PST) 18:30 UTC.
>
> YouTube stream:  https://www.youtube.com/watch?v=nMENRAkeHnQ
>
> As usual, you can join the conversation on IRC at #wikimedia-research.
> And, you can watch our past research showcases here
> <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#November_2017>
> .
>
> This month's presentation:
>
> Conversation Corpora, Emotional Robots, and Battles with BiasBy *Lucas
> Dixon (Google/Jigsaw)*I'll talk about interesting experimental setups for
> doing large-scale analysis of conversations in Wikipedia, and what it even
> means to grapple with the concept of conversation when one is talking about
> revisions on talk pages. I'll also describe challenges with having good
> conversations at scale, some of the dreams one might have for AI in the
> space, and I'll dig into measuring unintended bias in machine learning and
> what one can do to make ML more inclusive. This talk will cover work from
> the WikiDetox <https://meta.wikimedia.org/wiki/Research:Detox> project as
> well as ongoing research on the nature and impact of harassment in
> Wikipedia discussion spaces
> <https://meta.wikimedia.org/wiki/Research:Study_of_harassment_and_its_impact> 
> –
> part of a collaboration between Jigsaw, Cornell University, and the
> Wikimedia Foundation. The ML model training code, datasets, and the
> supporting tooling developed as part of this project are openly available.
>
>
> Many kind regards,
>
> Sarah R. Rodlund
> Senior Project Coordinator-Product & Technology, Wikimedia Foundation
> srodl...@wikimedia.org
>
>
>
>


-- 
Sarah R. Rodlund
Senior Project Coordinator-Product & Technology, Wikimedia Foundation
srodl...@wikimedia.org

*“Our lives begin to end the day we become silent about things that
matter.”  ~ Martin Luther King Jr
<https://www.goodreads.com/author/show/23924.Martin_Luther_King_Jr_>*
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

[Wikimedia-l] Research Showcase Wednesday, November 15, 2017 at 11:30 AM (PST) 18:30 UTC

2017-11-09 Thread Sarah R
Hi Everyone,

The next Research Showcase will be live-streamed this Wednesday, November
15, 2017 at 11:30 AM (PST) 18:30 UTC.

YouTube stream:  https://www.youtube.com/watch?v=nMENRAkeHnQ

As usual, you can join the conversation on IRC at #wikimedia-research. And,
you can watch our past research showcases here
<https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#November_2017>.

This month's presentation:

Conversation Corpora, Emotional Robots, and Battles with BiasBy *Lucas
Dixon (Google/Jigsaw)*I'll talk about interesting experimental setups for
doing large-scale analysis of conversations in Wikipedia, and what it even
means to grapple with the concept of conversation when one is talking about
revisions on talk pages. I'll also describe challenges with having good
conversations at scale, some of the dreams one might have for AI in the
space, and I'll dig into measuring unintended bias in machine learning and
what one can do to make ML more inclusive. This talk will cover work from
the WikiDetox <https://meta.wikimedia.org/wiki/Research:Detox> project as
well as ongoing research on the nature and impact of harassment in
Wikipedia discussion spaces
<https://meta.wikimedia.org/wiki/Research:Study_of_harassment_and_its_impact> –
part of a collaboration between Jigsaw, Cornell University, and the
Wikimedia Foundation. The ML model training code, datasets, and the
supporting tooling developed as part of this project are openly available.


Many kind regards,

Sarah R. Rodlund
Senior Project Coordinator-Product & Technology, Wikimedia Foundation
srodl...@wikimedia.org
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

[Wikimedia-l] Research Showcase Wednesday, August 23, 2017 at 11:30 AM (PST) 18:30 UTC

2017-08-21 Thread Sarah R
Hi Everyone,

The next Research Showcase will be live-streamed this Wednesday, August 23,
2017 at 11:30 AM (PST) 18:30 UTC.

YouTube stream: https://www.youtube.com/watch?v=Fa0Ztv2iF4w

As usual, you can join the conversation on IRC at #wikimedia-research. And,
you can watch our past research showcases here
<https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#August_2017>.

This month's presentation:

Sneha Narayan (Northwestern University)

*The Wikipedia Adventure: Field Evaluation of an Interactive Tutorial for
New Users*

Integrating new users into a community with complex norms presents a
challenge for peer production projects like Wikipedia. We present The
Wikipedia Adventure (TWA): an interactive tutorial that offers a structured
and gamified introduction to Wikipedia. In addition to describing the
design of the system, we present two empirical evaluations. First, we
report on a survey of users, who responded very positively to the tutorial.
Second, we report results from a large-scale invitation-based field
experiment that tests whether using TWA increased newcomers' subsequent
contributions to Wikipedia. We find no effect of either using the tutorial
or of being invited to do so over a period of 180 days. We conclude that
TWA produces a positive socialization experience for those who choose to
use it, but that it does not alter patterns of newcomer activity. We
reflect on the implications of these mixed results for the evaluation of
similar social computing systems.

Andrew Su (Scripps Research Institute)

*The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical
knowledge*

The Gene Wiki project began in 2007 with the goal of creating a
collaboratively-written, community-reviewed, and continuously-updated
review article for every human gene within Wikipedia.  In 2013, shortly
after the creation of the Wikidata project, the project expanded to include
the organization and integration of structured biomedical data.  This talk
will focus on our current and future work, including efforts to encourage
contributions from biomedical domain experts, to build custom applications
that use Wikidata as the back-end knowledge base, and to promote
CC0-licensing among biomedical knowledge resources.  Comments, feedback and
contributions are welcome at https://github.com/SuLab/genewikicentral and
https://www.wikidata.org/wiki/WD:MB.

Kindly,

Sarah R. Rodlund
Senior Project Coordinator-Product & Technology, Wikimedia Foundation
srodl...@wikimedia.org
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

[Wikimedia-l] Research Showcase Wednesday, July 26, 2017 at 11:30 AM (PST) 18:30 UTC

2017-07-25 Thread Sarah R
Hi Everyone,

The next Research Showcase will be live-streamed this Wednesday, July 26,
2017 at 11:30 AM (PST) 18:30 UTC.

YouTube stream: https://www.youtube.com/watch?v=yC1jgK8C8aQ

As usual, you can join the conversation on IRC at #wikimedia-research. And,
you can watch our past research showcases here
<https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#July_2017>.

This month's presentation:

Freedom versus Standardization: Structured Data Generation in a Peer
Production CommunityBy *Andrew Hall*In addition to encyclopedia articles
and software, peer production communities produce *structured data*, e.g.,
Wikidata and OpenStreetMap’s metadata. Structured data from peer production
communities has become increasingly important due to its use by
computational applications, such as CartoCSS, MapBox, and Wikipedia
infoboxes. However, this structured data is usable by applications only if
it follows *standards.* We did an interview study focused on
OpenStreetMap’s knowledge production processes to investigate how – and how
successfully – this community creates and applies its data standards. Our
study revealed a fundamental tension between the need to produce structured
data in a standardized way and OpenStreetMap’s tradition of contributor
freedom. We extracted six themes that manifested this tension and three
overarching concepts, *correctness, community,* and *code,* which help make
sense of and synthesize the themes. We also offer suggestions for improving
OpenStreetMap’s knowledge production processes, including new data models,
sociotechnical tools, and community practices.


Kindly,

Sarah R. Rodlund
Senior Project Coordinator-Product & Technology, Wikimedia Foundation
srodl...@wikimedia.org
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

Re: [Wikimedia-l] Research Showcase Wednesday June 21, 2017

2017-06-21 Thread Sarah R
Hi Everyone,

Just a reminder, this will begin at 11:30 AM PST Today!

Kind regards,

Sarah R.

On Sun, Jun 18, 2017 at 3:47 PM, Sarah R <srodl...@wikimedia.org> wrote:

> Hi Everyone,
>
> The next Research Showcase will be live-streamed this Wednesday, June 21,
> 2017 at 11:30 AM (PST) 18:30 UTC.
>
> YouTube stream: https://www.youtube.com/watch?v=i2jpKRwPT-Q
>
> As usual, you can join the conversation on IRC at #wikimedia-research.
> And, you can watch our past research showcases here
> <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#June_2017>.
>
> This month's presentations:
>
> Title: Problematizing and Addressing the Article-as-Concept Assumption in
> Wikipedia
>
> By *Allen Yilun Lin*
>
> Abstract: Wikipedia-based studies and systems frequently assume that each
> article describes a separate concept. However, in this paper, we show that
> this article-as-concept assumption is problematic due to editors’ tendency
> to split articles into parent articles and sub-articles when articles get
> too long for readers (e.g. “United States” and “American literature” in the
> English Wikipedia). In this paper, we present evidence that this issue can
> have significant impacts on Wikipedia-based studies and systems and
> introduce the subarticle matching problem. The goal of the sub-article
> matching problem is to automatically connect sub-articles to parent
> articles to help Wikipedia-based studies and systems retrieve complete
> information about a concept. We then describe the first system to address
> the sub-article matching problem. We show that, using a diverse feature set
> and standard machine learning techniques, our system can achieve good
> performance on most of our ground truth datasets, significantly
> outperforming baseline approaches.
>
>
> Title: Understanding Wikidata Queries
>
>
> By *Markus Kroetzsch*
>
> Abstract: Wikimedia provides a public service that lets anyone answer
> complex questions over the sum of all knowledge stored in Wikidata. These
> questions are expressed in the query language SPARQL and range from the
> most simple fact retrievals ("What is the birthday of Douglas Adams?") to
> complex analytical queries ("Average lifespan of people by occupation").
> The talk presents ongoing efforts to analyse the server logs of the
> millions of queries that are answered each month. It is an important but
> difficult challenge to draw meaningful conclusions from this dataset. One
> might hope to learn relevant information about the usage of the service and
> Wikidata in general, but at the same time one has to be careful not to be
> misled by the data. Indeed, the dataset turned out to be highly
> heterogeneous and unpredictable, with strongly varying usage patterns that
> make it difficult to draw conclusions about "normal" usage. The talk will
> give a status report, present preliminary results, and discuss possible
> next steps.
>
> --
> Sarah R. Rodlund
> Senior Project Coordinator-Product & Technology, Wikimedia Foundation
> srodl...@wikimedia.org
>
>
>


-- 
Sarah R. Rodlund
Senior Project Coordinator-Product & Technology, Wikimedia Foundation
srodl...@wikimedia.org

“*In a real sense all life is inter-related. All men are caught in an
inescapable network of mutuality, tied in a single garment of destiny.
Whatever affects one directly, affects all indirectly. I can never be what
I ought to be until you are what you ought to be, and you can never be what
you ought to be until I am what I ought to be...This is the inter-related
structure of reality.”**― Martin Luther King Jr.'s Letter from Birmingham
Jail and the Struggle That Changed a Nation
<http://www.goodreads.com/work/quotes/197294>*
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

[Wikimedia-l] Research Showcase Wednesday June 21, 2017

2017-06-18 Thread Sarah R
Hi Everyone,

The next Research Showcase will be live-streamed this Wednesday, June 21,
2017 at 11:30 AM (PST) 18:30 UTC.

YouTube stream: https://www.youtube.com/watch?v=i2jpKRwPT-Q

As usual, you can join the conversation on IRC at #wikimedia-research. And,
you can watch our past research showcases here
<https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#June_2017>.

This month's presentations:

Title: Problematizing and Addressing the Article-as-Concept Assumption in
Wikipedia

By *Allen Yilun Lin*

Abstract: Wikipedia-based studies and systems frequently assume that each
article describes a separate concept. However, in this paper, we show that
this article-as-concept assumption is problematic due to editors’ tendency
to split articles into parent articles and sub-articles when articles get
too long for readers (e.g. “United States” and “American literature” in the
English Wikipedia). In this paper, we present evidence that this issue can
have significant impacts on Wikipedia-based studies and systems and
introduce the subarticle matching problem. The goal of the sub-article
matching problem is to automatically connect sub-articles to parent
articles to help Wikipedia-based studies and systems retrieve complete
information about a concept. We then describe the first system to address
the sub-article matching problem. We show that, using a diverse feature set
and standard machine learning techniques, our system can achieve good
performance on most of our ground truth datasets, significantly
outperforming baseline approaches.


Title: Understanding Wikidata Queries


By *Markus Kroetzsch*

Abstract: Wikimedia provides a public service that lets anyone answer
complex questions over the sum of all knowledge stored in Wikidata. These
questions are expressed in the query language SPARQL and range from the
most simple fact retrievals ("What is the birthday of Douglas Adams?") to
complex analytical queries ("Average lifespan of people by occupation").
The talk presents ongoing efforts to analyse the server logs of the
millions of queries that are answered each month. It is an important but
difficult challenge to draw meaningful conclusions from this dataset. One
might hope to learn relevant information about the usage of the service and
Wikidata in general, but at the same time one has to be careful not to be
misled by the data. Indeed, the dataset turned out to be highly
heterogeneous and unpredictable, with strongly varying usage patterns that
make it difficult to draw conclusions about "normal" usage. The talk will
give a status report, present preliminary results, and discuss possible
next steps.

-- 
Sarah R. Rodlund
Senior Project Coordinator-Product & Technology, Wikimedia Foundation
srodl...@wikimedia.org
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

[Wikimedia-l] Research Showcase April 19, 2017

2017-04-17 Thread Sarah R
Hi Everyone,

The next Research Showcase will be live-streamed this Wednesday, April 19,
2017 at 11:30 AM (PST) 18:30 UTC.

YouTube stream: https://www.youtube.com/watch?v=_Prf0Vb-k1I

As usual, you can join the conversation on IRC at #wikimedia-research. And,
you can watch our past research showcases here
<https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#April_2017>.

This month's presentations:

Using WikiBrain to visualize Wikipedia's neighborhoodsBy *Dr. Shilad Sen
<https://www.mediawiki.org/wiki/User:Shilad>*While Wikipedia serves as the
world's most widely reference for humans, it also represents the most
widely use body of knowledge for algorithms that must reason about the
world. I will provide an overview of WikiBrain, a software project that
serves as a platform for Wikipedia-based algorithms. I will also demo a
brand new system built on WikiBrain that visualizes any dataset as a
topographic map whose neighborhoods correspond to related Wikipedia
articles. I hope to get feedback about which directions for these tools are
most useful to the Wikipedia research community.

-- 
Sarah R. Rodlund
Senior Project Coordinator-Product & Technology, Wikimedia Foundation
srodl...@wikimedia.org
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

Re: [Wikimedia-l] February 15, 2017 Research Showcase

2017-02-15 Thread Sarah R
Just a reminder this will be taking place in one hour!


On Tue, Feb 14, 2017 at 2:49 PM, Sarah R <srodl...@wikimedia.org> wrote:

> Hi Everyone,
>
> The next Research Showcase will be live-streamed this February 15, 2017 at
> 11:30 AM (PST) 18:30 UTC.
>
> YouTube stream: https://www.youtube.com/watch?v=m6smzMppb-I
>
> As usual, you can join the conversation on IRC at #wikimedia-research.
> And, you can watch our past research showcases here
> <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#February_2017>
> .
>
> This month's presentations:
>
> Wikipedia and the Urban-Rural DivideBy *Isaac Johnson*Wikipedia articles
> about places, OpenStreetMap features, and other forms of peer-produced
> content have become critical sources of geographic knowledge for humans and
> intelligent technologies. We explore the effectiveness of the peer
> production model across the rural/urban divide, a divide that has been
> shown to be an important factor in many online social systems. We find that
> in Wikipedia (as well as OpenStreetMap), peer-produced content about rural
> areas is of systematically lower quality, less likely to have been produced
> by contributors who focus on the local area, and more likely to have been
> generated by automated software agents (i.e. “bots”). We continue to
> explore and codify the systemic challenges inherent to characterizing rural
> phenomena through peer production as well as discuss potential solutions.
>
>
> Wikipedia Navigation VectorsBy *Ellery Wulczyn
> <https://www.mediawiki.org/wiki/User:Ewulczyn_(WMF)>*In this project, we
> learned embeddings for Wikipedia articles and Wikidata
> <https://www.wikidata.org/wiki/Wikidata:Main_Page> items by applying
> Word2vec <https://en.wikipedia.org/wiki/Word2vec> models to a corpus of
> reading sessions. Although Word2vec models were developed to learn word
> embeddings from a corpus of sentences, they can be applied to any kind of
> sequential data. The learned embeddings have the property that items with
> similar neighbors in the training corpus have similar representations (as
> measured by the cosine similarity
> <https://en.wikipedia.org/wiki/Cosine_similarity>, for example).
> Consequently, applying Wor2vec to reading sessions results in article
> embeddings, where articles that tend to be read in close succession have
> similar representations. Since people usually generate sequences of
> semantically related articles while reading, these embeddings also capture
> semantic similarity between articles.
>
> --
> Sarah R. Rodlund
> Senior Project Coordinator-Product & Technology, Wikimedia Foundation
> srodl...@wikimedia.org
>



-- 
Sarah R. Rodlund
Senior Project Coordinator-Product & Technology, Wikimedia Foundation
srodl...@wikimedia.org
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

[Wikimedia-l] February 15, 2017 Research Showcase

2017-02-14 Thread Sarah R
Hi Everyone,

The next Research Showcase will be live-streamed this February 15, 2017 at
11:30 AM (PST) 18:30 UTC.

YouTube stream: https://www.youtube.com/watch?v=m6smzMppb-I

As usual, you can join the conversation on IRC at #wikimedia-research. And,
you can watch our past research showcases here
<https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#February_2017>.

This month's presentations:

Wikipedia and the Urban-Rural DivideBy *Isaac Johnson*Wikipedia articles
about places, OpenStreetMap features, and other forms of peer-produced
content have become critical sources of geographic knowledge for humans and
intelligent technologies. We explore the effectiveness of the peer
production model across the rural/urban divide, a divide that has been
shown to be an important factor in many online social systems. We find that
in Wikipedia (as well as OpenStreetMap), peer-produced content about rural
areas is of systematically lower quality, less likely to have been produced
by contributors who focus on the local area, and more likely to have been
generated by automated software agents (i.e. “bots”). We continue to
explore and codify the systemic challenges inherent to characterizing rural
phenomena through peer production as well as discuss potential solutions.


Wikipedia Navigation VectorsBy *Ellery Wulczyn
<https://www.mediawiki.org/wiki/User:Ewulczyn_(WMF)>*In this project, we
learned embeddings for Wikipedia articles and Wikidata
<https://www.wikidata.org/wiki/Wikidata:Main_Page> items by applying
Word2vec <https://en.wikipedia.org/wiki/Word2vec> models to a corpus of
reading sessions. Although Word2vec models were developed to learn word
embeddings from a corpus of sentences, they can be applied to any kind of
sequential data. The learned embeddings have the property that items with
similar neighbors in the training corpus have similar representations (as
measured by the cosine similarity
<https://en.wikipedia.org/wiki/Cosine_similarity>, for example).
Consequently, applying Wor2vec to reading sessions results in article
embeddings, where articles that tend to be read in close succession have
similar representations. Since people usually generate sequences of
semantically related articles while reading, these embeddings also capture
semantic similarity between articles.

-- 
Sarah R. Rodlund
Senior Project Coordinator-Product & Technology, Wikimedia Foundation
srodl...@wikimedia.org
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

[Wikimedia-l] Research Showcase, December 21, 2016

2016-12-19 Thread Sarah R
Hi Everyone,

The next Research Showcase will be live-streamed this Wednesday,
December 21, 2016 at 11:30 AM (PST) 18:30 (UTC).

YouTube stream: https://www.youtube.com/watch?v=nmrlu5qTgyA

As usual, you can join the conversation on IRC at #wikimedia-research. And,
you can watch our past research showcases here
<https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#December_2016>.

The December 2016 Research Showcase includes:

English Wikipedia Quality Dynamics and the Case of WikiProject Women
ScientistsBy *Aaron Halfaker
<https://meta.wikimedia.org/wiki/User:Halfak_(WMF)>*With every productive
edit, Wikipedia is steadily progressing towards higher and higher quality.
In order to track quality improvements, Wikipedians have developed an
article quality assessment rating scale that ranges from "Stub" at the
bottom to "Featured Articles" at the top. While this quality scale has the
promise of giving us insights into the dynamics of quality improvements in
Wikipedia, it is hard to use due to the sporadic nature of manual
re-assessments. By developing a highly accurate prediction model (based on
work by Warncke-Wang et al.), we've developed a method to assess an
articles quality at any point in history. Using this model, we explore
general trends in quality in Wikipedia and compare these trends to those of
an interesting cross-section: Articles tagged by WikiProject Women
Scientists. Results suggest that articles about women scientists were lower
quality than the rest of the wiki until mid-2013, after which a dramatic
shift occurred towards higher quality. This shift may correlate with (and
even be caused by) this WikiProjects initiatives.


Privacy, Anonymity, and Perceived Risk in Open Collaboration. A Study of
Tor Users and WikipediansBy *Andrea Forte*In a recent qualitative study to
be published at CSCW 2017, collaborators Rachel Greenstadt, Naz Andalibi,
and I examined privacy practices and concerns among contributors to open
collaboration projects. We collected interview data from people who use the
anonymity network Tor who also contribute to online projects and from
Wikipedia editors who are concerned about their privacy to better
understand how privacy concerns impact participation in open collaboration
projects. We found that risks perceived by contributors to open
collaboration projects include threats of surveillance, violence,
harassment, opportunity loss, reputation loss, and fear for loved ones. We
explain participants’ operational and technical strategies for mitigating
these risks and how these strategies affect their contributions. Finally,
we discuss chilling effects associated with privacy loss, the need for open
collaboration projects to go beyond attracting and educating participants
to consider their privacy, and some of the social and technical approaches
that could be explored to mitigate risk at a project or community level.

-- 
Sarah R. Rodlund
Senior Project Coordinator-Engineering, Wikimedia Foundation
srodl...@wikimedia.org
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

Re: [Wikimedia-l] Research Showcase, September 21, 2016

2016-09-21 Thread Sarah R
Just a reminder, the Research Showcase will begin in one hour.

On Mon, Sep 19, 2016 at 5:02 PM, Sarah R <srodl...@wikimedia.org> wrote:

> Hi Everyone,
>
> The next Research Showcase will be live-streamed this Wednesday,
> September 21, 2016 at 11:30 AM (PST) 18:30 (UTC).
>
> YouTube stream: https://www.youtube.com/watch?v=fTDkVeqjw80
>
> As usual, you can join the conversation on IRC at #wikimedia-research.
> And, you can watch our past research showcases here
> <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#September_2016>
> .
>
> This month's showcase includes.
>
>
> Finding News Citations for WikipediaBy *Besnik Fetahu
> <http://www.l3s.de/~fetahu/> (Leibniz University of Hannover)*An
> important editing policy in Wikipedia is to provide citations for added
> statements in Wikipedia pages, where statements can be arbitrary pieces of
> text, ranging from a sentence to a paragraph. In many cases citations are
> either outdated or missing altogether. In this work we address the problem
> of finding and updating news citations for statements in entity pages. We
> propose a two- stage supervised approach for this problem. In the first
> step, we construct a classifier to find out whether statements need a news
> citation or other kinds of citations (web, book, journal, etc.). In the
> second step, we develop a news citation algorithm for Wikipedia statements,
> which recommends appropriate citations from a given news collection. Apart
> from IR techniques that use the statement to query the news collection, we
> also formalize three properties of an appropriate citation, namely: (i) the
> citation should entail the Wikipedia statement, (ii) the statement should
> be central to the citation, and (iii) the citation should be from an
> authoritative source. We perform an extensive evaluation of both steps,
> using 20 million articles from a real-world news collection. Our results
> are quite promising, and show that we can perform this task with high
> precision and at scale.
>
>
> Designing and Building Online Discussion SystemsBy *Amy X. Zhang
> <http://people.csail.mit.edu/axz/> (MIT)*Today, conversations are
> everywhere on the Internet and come in many different forms. However, there
> are still many problems with discussion interfaces today. In my talk, I
> will first give an overview of some of the problems with discussion
> systems, including difficulty dealing with large scales, which exacerbates
> additional problems with navigating deep threads containing lots of
> back-and-forth and getting an overall summary of a discussion. Other
> problems include dealing with moderation and harassment in discussion
> systems and gaining control over filtering, customization, and means of
> access. Then I will focus on a few projects I am working on in this space
> now. The first is Wikum, a system I developed to allow users to
> collaboratively generate a wiki-like summary from threaded discussion. The
> second, which I have just begun, is exploring the design space of
> presentation and navigation of threaded discussion. I will next discuss
> Murmur, a mailing list hybrid system we have built to implement and test
> ideas around filtering, customization, and flexibility of access, as well
> as combating harassment. Finally, I'll wrap up with what I am working on at
> Google Research this summer: developing a taxonomy to describe online forum
> discussion and using this information to extract meaningful content useful
> for search, summarization of discussions, and characterization of
> communities.
>
> Hope to see you there!
>
> Sarah R. Rodlund
> Senior Project Coordinator-Engineering, Wikimedia Foundation
> srodl...@wikimedia.org
>



-- 
Sarah R. Rodlund
Senior Project Coordinator-Engineering, Wikimedia Foundation
srodl...@wikimedia.org
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

[Wikimedia-l] Research Showcase, September 21, 2016

2016-09-19 Thread Sarah R
Hi Everyone,

The next Research Showcase will be live-streamed this Wednesday, September
21, 2016 at 11:30 AM (PST) 18:30 (UTC).

YouTube stream: https://www.youtube.com/watch?v=fTDkVeqjw80

As usual, you can join the conversation on IRC at #wikimedia-research. And,
you can watch our past research showcases here
<https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#September_2016>.

This month's showcase includes.


Finding News Citations for WikipediaBy *Besnik Fetahu
<http://www.l3s.de/~fetahu/> (Leibniz University of Hannover)*An important
editing policy in Wikipedia is to provide citations for added statements in
Wikipedia pages, where statements can be arbitrary pieces of text, ranging
from a sentence to a paragraph. In many cases citations are either outdated
or missing altogether. In this work we address the problem of finding and
updating news citations for statements in entity pages. We propose a two-
stage supervised approach for this problem. In the first step, we construct
a classifier to find out whether statements need a news citation or other
kinds of citations (web, book, journal, etc.). In the second step, we
develop a news citation algorithm for Wikipedia statements, which
recommends appropriate citations from a given news collection. Apart from
IR techniques that use the statement to query the news collection, we also
formalize three properties of an appropriate citation, namely: (i) the
citation should entail the Wikipedia statement, (ii) the statement should
be central to the citation, and (iii) the citation should be from an
authoritative source. We perform an extensive evaluation of both steps,
using 20 million articles from a real-world news collection. Our results
are quite promising, and show that we can perform this task with high
precision and at scale.


Designing and Building Online Discussion SystemsBy *Amy X. Zhang
<http://people.csail.mit.edu/axz/> (MIT)*Today, conversations are
everywhere on the Internet and come in many different forms. However, there
are still many problems with discussion interfaces today. In my talk, I
will first give an overview of some of the problems with discussion
systems, including difficulty dealing with large scales, which exacerbates
additional problems with navigating deep threads containing lots of
back-and-forth and getting an overall summary of a discussion. Other
problems include dealing with moderation and harassment in discussion
systems and gaining control over filtering, customization, and means of
access. Then I will focus on a few projects I am working on in this space
now. The first is Wikum, a system I developed to allow users to
collaboratively generate a wiki-like summary from threaded discussion. The
second, which I have just begun, is exploring the design space of
presentation and navigation of threaded discussion. I will next discuss
Murmur, a mailing list hybrid system we have built to implement and test
ideas around filtering, customization, and flexibility of access, as well
as combating harassment. Finally, I'll wrap up with what I am working on at
Google Research this summer: developing a taxonomy to describe online forum
discussion and using this information to extract meaningful content useful
for search, summarization of discussions, and characterization of
communities.

Hope to see you there!

Sarah R. Rodlund
Senior Project Coordinator-Engineering, Wikimedia Foundation
srodl...@wikimedia.org
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

[Wikimedia-l] Upcoming Research Showcase - August 17, 2016

2016-08-16 Thread Sarah R
Hi Everyone,

The next Research Showcase will be live-streamed this Wednesday, Aug 17,
2016 at 11:30 AM (PST) 18:30 (UTC).

YouTube stream: http://youtu.be/rsFmqYxtt9w

As usual, you can join the conversation on IRC at #wikimedia-research. And,
you can watch our past research showcases here
<https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#Archive>.

This month's showcase includes.

Computational Fact Checking from Knowledge NetworksBy *Giovanni Luca
Ciampaglia <https://www.mediawiki.org/wiki/User:Junkie.dolphin>*Traditional
fact checking by expert journalists cannot keep up with the enormous volume
of information that is now generated online. Fact checking is often a
tedious and repetitive task and even simple automation opportunities may
result in significant improvements to human fact checkers. In this talk I
will describe how we are trying to approximate the complexities of human
fact checking by exploring a knowledge graph under a properly defined
proximity measure. Framed as a network traversal problem, this approach is
feasible with efficient computational techniques. We evaluate this approach
by examining tens of thousands of claims related to history, entertainment,
geography, and biographical information using the public knowledge graph
extracted from Wikipedia by the DBPedia project, showing that the method
does indeed assign higher confidence to true statements than to false ones.
One advantage of this approach is that, together with a numerical
evaluation, it also provides a sequence of statements that can be easily
inspected by a human fact checker.


Deploying and maintaining AI in a socio-technical system. Lessons
learnedBy *Aaron
Halfaker <https://www.mediawiki.org/wiki/User:Halfak_(WMF)>*We should
exercise great caution when deploying AI into our social spaces. The
algorithms that make counter-vandalism in Wikipedia orders of magnitude
more efficient also have the potential to perpetuate biases and silence
whole classes of contributors. This presentation will describe the system
efficiency characteristics that make AI so attractive for supporting
quality control activities in Wikipedia. Then, Aaron will tell two stories
of how the algorithms brought new, problematic biases to quality control
processes in Wikipedia and how the Revision Scoring team
<https://meta.wikimedia.org/wiki/R:Revision_scoring_as_a_service> learned
about and addressed these issues in ORES
<https://meta.wikimedia.org/wiki/ORES>, a production-level AI service for
Wikimedia Wikis. He'll also make an overdue call to action toward
leveraging human-review of AIs biases in the practice of AI development.

We look forward to seeing you!

-- 
Sarah R. Rodlund
Project Coordinator-Engineering, Wikimedia Foundation
srodl...@wikimedia.org
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>