mforns added a comment.

@Nuria @Smalyshev

So probably if we round timestamp and remove sessionId your proposal for dattaset #1 is safe to keep long term (cc @mforns for anything I might be missing)

I think it depends highly on how drastically we sanitize the potentially identifying fields (user agent and client IP) and the fields that can indicate user acivity/features (query, location).
Intuitively it seems to me that we can keep this data in a private store indefinitely if sanitized. But having those sensitive 4 fields in the same data set will make it difficult to publicize, even if sanitized. I don't know how frequent are WDQS queries, but I imagine they are several orders of magnitude smaller than pageviews. Thus the buckets of this data set are likely to be sparse and small, which increases the threat to user privacy.

If we wanted to make this public, I'd go for removing the geographic location field entirely, and probably for daily or monthly resolution instead of hourly (depending on bucket size).
Also, splitting the data set in several unrelatable thematic data sets could help: queries by country, queries by user agent, session queries, etc.

Sorry if I'm too pessimistic, I'm not familiar with the kind of information that WDQS queries can give away about users.


TASK DETAIL
https://phabricator.wikimedia.org/T143819

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mforns
Cc: mforns, PokestarFan, Nuria, Lydia_Pintscher, mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, Gehel, FloNight, Xmlizer, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to