🎉🎉🎉
Congrats on this release!  Looking forward to using it in some projects 😀

--

Nate

Hal Triedman <[email protected]> writes:

> Hello world!
>
> My name is Hal Triedman, and I’m a senior privacy engineer at WMF. I work
> to make data that WMF releases about reading, editing, and other on-wiki
> behavior safer, more granular, and more accessible to the world using
> differential
> privacy 
> <https://urldefense.com/v3/__https://en.wikipedia.org/wiki/Differential_privacy__;!!K-Hz7m0Vt54!kHgKJR-yqLmvdOOgZk4l4si0zwohO5s-oPdBylq552ajKh3JP2yeJ0OA6dYyzZ0TdU8s8sTPRz62i1-ryGOD$
>  >.
>
> Today I’m reaching out to share that WMF has released almost 8 years (from
> 1 July 2015 to present) of privatized pageview data
> <https://urldefense.com/v3/__https://diff.wikimedia.org/2023/06/21/new-dataset-uncovers-wikipedia-browsing-habits-while-protecting-users/__;!!K-Hz7m0Vt54!kHgKJR-yqLmvdOOgZk4l4si0zwohO5s-oPdBylq552ajKh3JP2yeJ0OA6dYyzZ0TdU8s8sTPRz62i5ol8s2K$
>  >,
> partitioned by country, project, and page. This data is significantly more
> granular than other datasets we release, and should help researchers to
> disambiguate both long- and short-term trends within languages on a
> country-by-country basis — several
> <https://urldefense.com/v3/__https://phabricator.wikimedia.org/T207171__;!!K-Hz7m0Vt54!kHgKJR-yqLmvdOOgZk4l4si0zwohO5s-oPdBylq552ajKh3JP2yeJ0OA6dYyzZ0TdU8s8sTPRz62i6DiJTA9$
>  > long-standing requests
> <https://urldefense.com/v3/__https://phabricator.wikimedia.org/T267283__;!!K-Hz7m0Vt54!kHgKJR-yqLmvdOOgZk4l4si0zwohO5s-oPdBylq552ajKh3JP2yeJ0OA6dYyzZ0TdU8s8sTPRz62i1FkphDi$
>  > from Wikimedia communities.
>
> Due to various technical factors, there are three distinct datasets:
>
>    -
>
>    1 July 2015 – 8 Feb 2017
>    
> <https://urldefense.com/v3/__https://analytics.wikimedia.org/published/datasets/country_project_page_historical_pre_2017/__;!!K-Hz7m0Vt54!kHgKJR-yqLmvdOOgZk4l4si0zwohO5s-oPdBylq552ajKh3JP2yeJ0OA6dYyzZ0TdU8s8sTPRz62izSIUQFb$
>  >
>    / README
>    
> <https://urldefense.com/v3/__https://analytics.wikimedia.org/published/datasets/country_project_page_historical_pre_2017/00_README.html__;!!K-Hz7m0Vt54!kHgKJR-yqLmvdOOgZk4l4si0zwohO5s-oPdBylq552ajKh3JP2yeJ0OA6dYyzZ0TdU8s8sTPRz62i-H6ctyC$
>  >
>    (publishing threshold [1]: 3,500 pageviews)
>    -
>
>    9 Feb 2017 – 5 Feb 2023
>    
> <https://urldefense.com/v3/__https://analytics.wikimedia.org/published/datasets/country_project_page_historical/__;!!K-Hz7m0Vt54!kHgKJR-yqLmvdOOgZk4l4si0zwohO5s-oPdBylq552ajKh3JP2yeJ0OA6dYyzZ0TdU8s8sTPRz62i-PXbJai$
>  >
>    / README
>    
> <https://urldefense.com/v3/__https://analytics.wikimedia.org/published/datasets/country_project_page_historical/00_README.html__;!!K-Hz7m0Vt54!kHgKJR-yqLmvdOOgZk4l4si0zwohO5s-oPdBylq552ajKh3JP2yeJ0OA6dYyzZ0TdU8s8sTPRz62i2KhpRYn$
>  >
>    (publishing threshold: 450 pageviews)
>    -
>
>    6 Feb 2023 – present
>    
> <https://urldefense.com/v3/__https://analytics.wikimedia.org/published/datasets/country_project_page/__;!!K-Hz7m0Vt54!kHgKJR-yqLmvdOOgZk4l4si0zwohO5s-oPdBylq552ajKh3JP2yeJ0OA6dYyzZ0TdU8s8sTPRz62ix18iaAA$
>  >
>    / README
>    
> <https://urldefense.com/v3/__https://analytics.wikimedia.org/published/datasets/country_project_page/00_README.html__;!!K-Hz7m0Vt54!kHgKJR-yqLmvdOOgZk4l4si0zwohO5s-oPdBylq552ajKh3JP2yeJ0OA6dYyzZ0TdU8s8sTPRz62iw9flqKp$
>  >
>    (publishing threshold: 90 pageviews)
>
>
> API access to this data should be coming in the next few months. In the
> interim, I’ve built an example python notebook
> <https://urldefense.com/v3/__https://public-paws.wmcloud.org/67457802/private_pageview_data_access.ipynb__;!!K-Hz7m0Vt54!kHgKJR-yqLmvdOOgZk4l4si0zwohO5s-oPdBylq552ajKh3JP2yeJ0OA6dYyzZ0TdU8s8sTPRz62i7Isjmbs$
>  >
> illustrating how one might access the data in its current csv format, as
> well as several different kinds of simple analyses that can be done with it.
>
> I also want to invite the research community to join me for a brief demo of
> this project at the July Research Showcase
> <https://urldefense.com/v3/__https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase__;!!K-Hz7m0Vt54!kHgKJR-yqLmvdOOgZk4l4si0zwohO5s-oPdBylq552ajKh3JP2yeJ0OA6dYyzZ0TdU8s8sTPRz62iyAQUQw1$
>  >. In the
> meantime, please feel free to reach out with any questions on the project talk
> page 
> <https://urldefense.com/v3/__https://meta.wikimedia.org/wiki/Talk:Differential_privacy__;!!K-Hz7m0Vt54!kHgKJR-yqLmvdOOgZk4l4si0zwohO5s-oPdBylq552ajKh3JP2yeJ0OA6dYyzZ0TdU8s8sTPRz62i4xooABF$
>  >.
>
> For more information about WMF’s work on differential privacy more
> generally, see the differential privacy homepage on meta
> <https://urldefense.com/v3/__https://meta.wikimedia.org/wiki/Differential_privacy__;!!K-Hz7m0Vt54!kHgKJR-yqLmvdOOgZk4l4si0zwohO5s-oPdBylq552ajKh3JP2yeJ0OA6dYyzZ0TdU8s8sTPRz62izCLCLLl$
>  >. And in the future,
> look for more announcements of privatized datasets on editor behavior,
> on-wiki search, centralnotice impressions and clicks, and more.
>
> Best,
>
> Hal
>
> [1] “Publishing threshold” is the minimum value of a row in the dataset in
> order to be published.
> _______________________________________________
> Wiki-research-l mailing list -- [email protected]
> To unsubscribe send an email to [email protected]

-- 
Nathan TeBlunthuis
Postdoctoral Research Fellow
University of Michigan
School of Information
https://teblunthuis.cc
_______________________________________________
Wiki-research-l mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to