🎉🎉🎉 Congrats on this release! Looking forward to using it in some projects 😀
-- Nate Hal Triedman <[email protected]> writes: > Hello world! > > My name is Hal Triedman, and I’m a senior privacy engineer at WMF. I work > to make data that WMF releases about reading, editing, and other on-wiki > behavior safer, more granular, and more accessible to the world using > differential > privacy > <https://urldefense.com/v3/__https://en.wikipedia.org/wiki/Differential_privacy__;!!K-Hz7m0Vt54!kHgKJR-yqLmvdOOgZk4l4si0zwohO5s-oPdBylq552ajKh3JP2yeJ0OA6dYyzZ0TdU8s8sTPRz62i1-ryGOD$ > >. > > Today I’m reaching out to share that WMF has released almost 8 years (from > 1 July 2015 to present) of privatized pageview data > <https://urldefense.com/v3/__https://diff.wikimedia.org/2023/06/21/new-dataset-uncovers-wikipedia-browsing-habits-while-protecting-users/__;!!K-Hz7m0Vt54!kHgKJR-yqLmvdOOgZk4l4si0zwohO5s-oPdBylq552ajKh3JP2yeJ0OA6dYyzZ0TdU8s8sTPRz62i5ol8s2K$ > >, > partitioned by country, project, and page. This data is significantly more > granular than other datasets we release, and should help researchers to > disambiguate both long- and short-term trends within languages on a > country-by-country basis — several > <https://urldefense.com/v3/__https://phabricator.wikimedia.org/T207171__;!!K-Hz7m0Vt54!kHgKJR-yqLmvdOOgZk4l4si0zwohO5s-oPdBylq552ajKh3JP2yeJ0OA6dYyzZ0TdU8s8sTPRz62i6DiJTA9$ > > long-standing requests > <https://urldefense.com/v3/__https://phabricator.wikimedia.org/T267283__;!!K-Hz7m0Vt54!kHgKJR-yqLmvdOOgZk4l4si0zwohO5s-oPdBylq552ajKh3JP2yeJ0OA6dYyzZ0TdU8s8sTPRz62i1FkphDi$ > > from Wikimedia communities. > > Due to various technical factors, there are three distinct datasets: > > - > > 1 July 2015 – 8 Feb 2017 > > <https://urldefense.com/v3/__https://analytics.wikimedia.org/published/datasets/country_project_page_historical_pre_2017/__;!!K-Hz7m0Vt54!kHgKJR-yqLmvdOOgZk4l4si0zwohO5s-oPdBylq552ajKh3JP2yeJ0OA6dYyzZ0TdU8s8sTPRz62izSIUQFb$ > > > / README > > <https://urldefense.com/v3/__https://analytics.wikimedia.org/published/datasets/country_project_page_historical_pre_2017/00_README.html__;!!K-Hz7m0Vt54!kHgKJR-yqLmvdOOgZk4l4si0zwohO5s-oPdBylq552ajKh3JP2yeJ0OA6dYyzZ0TdU8s8sTPRz62i-H6ctyC$ > > > (publishing threshold [1]: 3,500 pageviews) > - > > 9 Feb 2017 – 5 Feb 2023 > > <https://urldefense.com/v3/__https://analytics.wikimedia.org/published/datasets/country_project_page_historical/__;!!K-Hz7m0Vt54!kHgKJR-yqLmvdOOgZk4l4si0zwohO5s-oPdBylq552ajKh3JP2yeJ0OA6dYyzZ0TdU8s8sTPRz62i-PXbJai$ > > > / README > > <https://urldefense.com/v3/__https://analytics.wikimedia.org/published/datasets/country_project_page_historical/00_README.html__;!!K-Hz7m0Vt54!kHgKJR-yqLmvdOOgZk4l4si0zwohO5s-oPdBylq552ajKh3JP2yeJ0OA6dYyzZ0TdU8s8sTPRz62i2KhpRYn$ > > > (publishing threshold: 450 pageviews) > - > > 6 Feb 2023 – present > > <https://urldefense.com/v3/__https://analytics.wikimedia.org/published/datasets/country_project_page/__;!!K-Hz7m0Vt54!kHgKJR-yqLmvdOOgZk4l4si0zwohO5s-oPdBylq552ajKh3JP2yeJ0OA6dYyzZ0TdU8s8sTPRz62ix18iaAA$ > > > / README > > <https://urldefense.com/v3/__https://analytics.wikimedia.org/published/datasets/country_project_page/00_README.html__;!!K-Hz7m0Vt54!kHgKJR-yqLmvdOOgZk4l4si0zwohO5s-oPdBylq552ajKh3JP2yeJ0OA6dYyzZ0TdU8s8sTPRz62iw9flqKp$ > > > (publishing threshold: 90 pageviews) > > > API access to this data should be coming in the next few months. In the > interim, I’ve built an example python notebook > <https://urldefense.com/v3/__https://public-paws.wmcloud.org/67457802/private_pageview_data_access.ipynb__;!!K-Hz7m0Vt54!kHgKJR-yqLmvdOOgZk4l4si0zwohO5s-oPdBylq552ajKh3JP2yeJ0OA6dYyzZ0TdU8s8sTPRz62i7Isjmbs$ > > > illustrating how one might access the data in its current csv format, as > well as several different kinds of simple analyses that can be done with it. > > I also want to invite the research community to join me for a brief demo of > this project at the July Research Showcase > <https://urldefense.com/v3/__https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase__;!!K-Hz7m0Vt54!kHgKJR-yqLmvdOOgZk4l4si0zwohO5s-oPdBylq552ajKh3JP2yeJ0OA6dYyzZ0TdU8s8sTPRz62iyAQUQw1$ > >. In the > meantime, please feel free to reach out with any questions on the project talk > page > <https://urldefense.com/v3/__https://meta.wikimedia.org/wiki/Talk:Differential_privacy__;!!K-Hz7m0Vt54!kHgKJR-yqLmvdOOgZk4l4si0zwohO5s-oPdBylq552ajKh3JP2yeJ0OA6dYyzZ0TdU8s8sTPRz62i4xooABF$ > >. > > For more information about WMF’s work on differential privacy more > generally, see the differential privacy homepage on meta > <https://urldefense.com/v3/__https://meta.wikimedia.org/wiki/Differential_privacy__;!!K-Hz7m0Vt54!kHgKJR-yqLmvdOOgZk4l4si0zwohO5s-oPdBylq552ajKh3JP2yeJ0OA6dYyzZ0TdU8s8sTPRz62izCLCLLl$ > >. And in the future, > look for more announcements of privatized datasets on editor behavior, > on-wiki search, centralnotice impressions and clicks, and more. > > Best, > > Hal > > [1] “Publishing threshold” is the minimum value of a row in the dataset in > order to be published. > _______________________________________________ > Wiki-research-l mailing list -- [email protected] > To unsubscribe send an email to [email protected] -- Nathan TeBlunthuis Postdoctoral Research Fellow University of Michigan School of Information https://teblunthuis.cc _______________________________________________ Wiki-research-l mailing list -- [email protected] To unsubscribe send an email to [email protected]
