Re: [Wikimedia-l] Research Showcase Wednesday June 21, 2017

2017-06-21 Thread Sarah R
Hi Everyone,

Just a reminder, this will begin at 11:30 AM PST Today!

Kind regards,

Sarah R.

On Sun, Jun 18, 2017 at 3:47 PM, Sarah R  wrote:

> Hi Everyone,
>
> The next Research Showcase will be live-streamed this Wednesday, June 21,
> 2017 at 11:30 AM (PST) 18:30 UTC.
>
> YouTube stream: https://www.youtube.com/watch?v=i2jpKRwPT-Q
>
> As usual, you can join the conversation on IRC at #wikimedia-research.
> And, you can watch our past research showcases here
> .
>
> This month's presentations:
>
> Title: Problematizing and Addressing the Article-as-Concept Assumption in
> Wikipedia
>
> By *Allen Yilun Lin*
>
> Abstract: Wikipedia-based studies and systems frequently assume that each
> article describes a separate concept. However, in this paper, we show that
> this article-as-concept assumption is problematic due to editors’ tendency
> to split articles into parent articles and sub-articles when articles get
> too long for readers (e.g. “United States” and “American literature” in the
> English Wikipedia). In this paper, we present evidence that this issue can
> have significant impacts on Wikipedia-based studies and systems and
> introduce the subarticle matching problem. The goal of the sub-article
> matching problem is to automatically connect sub-articles to parent
> articles to help Wikipedia-based studies and systems retrieve complete
> information about a concept. We then describe the first system to address
> the sub-article matching problem. We show that, using a diverse feature set
> and standard machine learning techniques, our system can achieve good
> performance on most of our ground truth datasets, significantly
> outperforming baseline approaches.
>
>
> Title: Understanding Wikidata Queries
>
>
> By *Markus Kroetzsch*
>
> Abstract: Wikimedia provides a public service that lets anyone answer
> complex questions over the sum of all knowledge stored in Wikidata. These
> questions are expressed in the query language SPARQL and range from the
> most simple fact retrievals ("What is the birthday of Douglas Adams?") to
> complex analytical queries ("Average lifespan of people by occupation").
> The talk presents ongoing efforts to analyse the server logs of the
> millions of queries that are answered each month. It is an important but
> difficult challenge to draw meaningful conclusions from this dataset. One
> might hope to learn relevant information about the usage of the service and
> Wikidata in general, but at the same time one has to be careful not to be
> misled by the data. Indeed, the dataset turned out to be highly
> heterogeneous and unpredictable, with strongly varying usage patterns that
> make it difficult to draw conclusions about "normal" usage. The talk will
> give a status report, present preliminary results, and discuss possible
> next steps.
>
> --
> Sarah R. Rodlund
> Senior Project Coordinator-Product & Technology, Wikimedia Foundation
> srodl...@wikimedia.org
>
>
>


-- 
Sarah R. Rodlund
Senior Project Coordinator-Product & Technology, Wikimedia Foundation
srodl...@wikimedia.org

“*In a real sense all life is inter-related. All men are caught in an
inescapable network of mutuality, tied in a single garment of destiny.
Whatever affects one directly, affects all indirectly. I can never be what
I ought to be until you are what you ought to be, and you can never be what
you ought to be until I am what I ought to be...This is the inter-related
structure of reality.”**― Martin Luther King Jr.'s Letter from Birmingham
Jail and the Struggle That Changed a Nation
*
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


[Wikimedia-l] Research Showcase Wednesday June 21, 2017

2017-06-18 Thread Sarah R
Hi Everyone,

The next Research Showcase will be live-streamed this Wednesday, June 21,
2017 at 11:30 AM (PST) 18:30 UTC.

YouTube stream: https://www.youtube.com/watch?v=i2jpKRwPT-Q

As usual, you can join the conversation on IRC at #wikimedia-research. And,
you can watch our past research showcases here
.

This month's presentations:

Title: Problematizing and Addressing the Article-as-Concept Assumption in
Wikipedia

By *Allen Yilun Lin*

Abstract: Wikipedia-based studies and systems frequently assume that each
article describes a separate concept. However, in this paper, we show that
this article-as-concept assumption is problematic due to editors’ tendency
to split articles into parent articles and sub-articles when articles get
too long for readers (e.g. “United States” and “American literature” in the
English Wikipedia). In this paper, we present evidence that this issue can
have significant impacts on Wikipedia-based studies and systems and
introduce the subarticle matching problem. The goal of the sub-article
matching problem is to automatically connect sub-articles to parent
articles to help Wikipedia-based studies and systems retrieve complete
information about a concept. We then describe the first system to address
the sub-article matching problem. We show that, using a diverse feature set
and standard machine learning techniques, our system can achieve good
performance on most of our ground truth datasets, significantly
outperforming baseline approaches.


Title: Understanding Wikidata Queries


By *Markus Kroetzsch*

Abstract: Wikimedia provides a public service that lets anyone answer
complex questions over the sum of all knowledge stored in Wikidata. These
questions are expressed in the query language SPARQL and range from the
most simple fact retrievals ("What is the birthday of Douglas Adams?") to
complex analytical queries ("Average lifespan of people by occupation").
The talk presents ongoing efforts to analyse the server logs of the
millions of queries that are answered each month. It is an important but
difficult challenge to draw meaningful conclusions from this dataset. One
might hope to learn relevant information about the usage of the service and
Wikidata in general, but at the same time one has to be careful not to be
misled by the data. Indeed, the dataset turned out to be highly
heterogeneous and unpredictable, with strongly varying usage patterns that
make it difficult to draw conclusions about "normal" usage. The talk will
give a status report, present preliminary results, and discuss possible
next steps.

-- 
Sarah R. Rodlund
Senior Project Coordinator-Product & Technology, Wikimedia Foundation
srodl...@wikimedia.org
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,