Hi all,

The next Research Showcase, focused on *Data Privacy*, will be
live-streamed on Wednesday, October 18, at 9:30 AM PST / 16:30 UTC. Find
your local time here <https://zonestamp.toolforge.org/1697646641>.

YouTube stream: https://www.youtube.com/watch?v=ntgRsMaDlsw. As usual, you
can join the conversation in the YouTube chat as soon as the showcase goes
live.

This month's presentations:
Wikipedia Reader Navigation: When Synthetic Data Is EnoughBy *Akhil Arora,
EPFL*Every day millions of people read Wikipedia. When navigating the vast
space of available topics using hyperlinks, readers describe trajectories
on the article network. Understanding these navigation patterns is crucial
to better serve readers’ needs and address structural biases and knowledge
gaps. However, systematic studies of navigation on Wikipedia are hindered
by a lack of publicly available data due to the commitment to protect
readers' privacy by not storing or sharing potentially sensitive data. In
this paper, we ask: How well can Wikipedia readers' navigation be
approximated by using publicly available resources, most notably the
Wikipedia clickstream data <https://wikinav.toolforge.org/>? We
systematically quantify the differences between real navigation sequences
and synthetic sequences generated from the clickstream data, in 6 analyses
across 8 Wikipedia language versions. Overall, we find that the differences
between real and synthetic sequences are statistically significant, but
with small effect sizes, often well below 10%. This constitutes
quantitative evidence for the utility of the Wikipedia clickstream data as
a public resource: clickstream data can closely capture reader navigation
on Wikipedia and provides a sufficient approximation for most practical
downstream applications relying on reader data. More broadly, this study
provides an example for how clickstream-like data can generally enable
research on user navigation on online platforms while protecting users’
privacy.
How to tell the world about data you cannot show them: Differential privacy
at the Wikimedia FoundationBy *Hal Triedman, Wikimedia Foundation*The
Wikimedia Foundation (WMF), by virtue of its centrality on the internet,
collects lots of data about platform activities. Some of that data is made
public (e.g. global daily pageviews); other data types are not shared (or
are pseudonymized prior to sharing), largely due to privacy concerns.
Differential privacy is a statistical definition of privacy that has gained
prominence in academia, but is still an emerging technology in industry. In
this talk, I share the story of how we put differential privacy into
production at the WMF, through looking at the case study of geolocated
daily pageview counts.
You can also watch our past research showcases here:
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase

Best,
Kinneret
--

Kinneret Gordon

Lead Research Community Officer

Wikimedia Foundation <https://wikimediafoundation.org/>


-- 

Kinneret Gordon

Lead Research Community Officer

Wikimedia Foundation <https://wikimediafoundation.org/>
_______________________________________________
Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org
To unsubscribe send an email to wiki-research-l-le...@lists.wikimedia.org

Reply via email to