The next Research Showcase will be live-streamed this February 15, 2017
11:30 AM (PST) 18:30 UTC.

YouTube stream: https://www.youtube.com/watch?v=m6smzMppb-I

As usual, you can join the conversation on IRC at #wikimedia-research. And,
you can watch our past research showcases here

This month's presentations:

Wikipedia and the Urban-Rural DivideBy *Isaac Johnson*Wikipedia articles
about places, OpenStreetMap features, and other forms of peer-produced
content have become critical sources of geographic knowledge for humans and
intelligent technologies. We explore the effectiveness of the peer
production model across the rural/urban divide, a divide that has been
shown to be an important factor in many online social systems. We find that
in Wikipedia (as well as OpenStreetMap), peer-produced content about rural
areas is of systematically lower quality, less likely to have been produced
by contributors who focus on the local area, and more likely to have been
generated by automated software agents (i.e. “bots”). We continue to
explore and codify the systemic challenges inherent to characterizing rural
phenomena through peer production as well as discuss potential solutions.

Wikipedia Navigation VectorsBy *Ellery Wulczyn
<https://www.mediawiki.org/wiki/User:Ewulczyn_(WMF)>*In this project, we
learned embeddings for Wikipedia articles and Wikidata
<https://www.wikidata.org/wiki/Wikidata:Main_Page> items by applying
Word2vec <https://en.wikipedia.org/wiki/Word2vec> models to a corpus of
reading sessions. Although Word2vec models were developed to learn word
embeddings from a corpus of sentences, they can be applied to any kind of
sequential data. The learned embeddings have the property that items with
similar neighbors in the training corpus have similar representations (as
measured by the cosine similarity
<https://en.wikipedia.org/wiki/Cosine_similarity>, for example).
Consequently, applying Wor2vec to reading sessions results in article
embeddings, where articles that tend to be read in close succession have
similar representations. Since people usually generate sequences of
semantically related articles while reading, these embeddings also capture
semantic similarity between articles.

