The next Research Showcase will be live-streamed this Wednesday, March 21,
2018 at 11:30 AM (PDT) 18:30 UTC.

YouTube stream:  https://www.youtube.com/watch?v=ACevHs0sMMw

As usual, you can join the conversation on IRC at #wikimedia-research. And,
you can watch our past research showcases here

Over the past years, the Research team at Wikimedia Foundation and some of
our formal collaborators have been focused on doing research and building
technologies that can help editors across Wikimedia languages find tasks
for contributions. While the early effort was heavily focused on article
recommendation for creation (horizontal expansion), in 2016 we started a
new direction of research with a focus on vertical expansion of Wikipedia
articles. The two talks in the March 2018 Research Showcase will share some
of what we have learned from this research. More specifically, we will talk
about Wikipedia category network as a great signal for creating
templates/structures for Wikipedia articles as well as ongoing research to
learn what content (sections) are missing from Wikipedia across its many
languages. The two corresponding abstracts with more details are below.
Join us!

Using Wikipedia categories for research: opportunities, challenges, and
solutionsBy *Tiziano Piccardi, EPFL*The category network in Wikipedia is
used by editors as a way to label articles and organize them in a
hierarchical structure. This manually created and curated network of 1.6
million nodes in English Wikipedia generated by arranging the categories in
a child-parent relation (i.e., Scientists-People, Cities-Human Settlement)
allows researchers to infer valuable relations between concepts. A clean
structure in this format would be a valuable resource for a variety of
tools and application including automatic reasoning tools. Unfortunately,
Wikipedia category network contains some "noise" since in many cases the
association as subcategory does not define an is-a relation (Scientists
is-a People vs. Billionaires‎ is-a Wealth). Inspired to develop a model for
recommending sections to be added to the already existing Wikipedia
articles, we developed a method to clean this network and to keep only the
categories that have a high chance to be associated with their children by
an is-a relation. The strategy is based on the concept of "pure"
categories, and the algorithm uses the types of the attached articles to
determine how homogenous the category is. The approach does not rely on any
linguistic feature and therefore is suitable for all Wikipedia languages.
In this talk, we will discuss the high-level overview of the algorithm and
some of the possible applications for the generated network beyond article
section recommendations.

Beyond Automatic Translation: Aligning Wikipedia sections across multiple
languagesBy *Diego Saez-Trumper*Sections are the building blocks of
Wikipedia articles. For editors, they can be used as an entry point for
creating and expanding articles. For readers, they enhance readability of
Wikipedia content. In this talk, we present an ongoing research to align
article sections across Wikipedia languages. We show how the available
technology for automatic translations are not good enough for translating
section titles. We then show a complementary approach for section
alignment, using Wikidata and cross-lingual word embeddings. We will
present some of the use-cases of a methodology for aligning sections across
languages, including improved section recommendation, especially in medium
to smaller size languages where the language itself may not contain enough
signal about the structure of the articles and signals can be inferred from
other larger Wikipedia languages.

Sarah R. Rodlund
Senior Project Coordinator-Product & Technology, Wikimedia Foundation
