[Wikidata-bugs] [Maniphest] [Commented On] T183020: Investigate the possibility to release Wikidata queries

2018-07-10 Thread leila
leila added a comment.

In T183020#4409356, @Smalyshev wrote:
@leila - anything else we need to do here before the release?


If you have Legal, Security, and the team's manager's sign off, you have checked all the practical boxes I listed earlier. You should be good to go. (I do want to call out that we have devised this process based on what makes sense for this dataset and past experiences. it would be good/essential for WMF to have a process in place.)TASK DETAILhttps://phabricator.wikimedia.org/T183020EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: leilaCc: EBjune, mkroetzsch, Smalyshev, DarTar, leila, Aklapper, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, Wikidata-bugs, aude, Capt_Swing, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T183020: Investigate the possibility to release Wikidata queries

2018-07-09 Thread EBjune
EBjune added a comment.
@Smalyshev you have my sign-off on this, thanks to you and @leila for persisting in making this important data available to researchers!TASK DETAILhttps://phabricator.wikimedia.org/T183020EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: EBjuneCc: EBjune, mkroetzsch, Smalyshev, DarTar, leila, Aklapper, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, Wikidata-bugs, aude, Capt_Swing, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T183020: Investigate the possibility to release Wikidata queries

2018-07-09 Thread Smalyshev
Smalyshev added a comment.
I think we've got all the approvals for this except for the formal nod from @EBjune. @leila - anything else we need to do here before the release?TASK DETAILhttps://phabricator.wikimedia.org/T183020EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: EBjune, mkroetzsch, Smalyshev, DarTar, leila, Aklapper, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, Wikidata-bugs, aude, Capt_Swing, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T183020: Investigate the possibility to release Wikidata queries

2018-03-27 Thread leila
leila added a comment.
@Smalyshev I would say you need your team's manager sign-off, plus Security's and Legal's. Given that you're deeply familiar with this data and how it's processed, you're perhaps in the best position to have these conversations with the three people/entities.TASK DETAILhttps://phabricator.wikimedia.org/T183020EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: leilaCc: mkroetzsch, Smalyshev, DarTar, leila, Aklapper, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, Wikidata-bugs, aude, Capt_Swing, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T183020: Investigate the possibility to release Wikidata queries

2018-03-27 Thread Smalyshev
Smalyshev added a comment.
@leila I can probably review it, but I am not sure how "sign off" looks like. Is it just me saying "I'm ok with it" or something more formal is required?TASK DETAILhttps://phabricator.wikimedia.org/T183020EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: mkroetzsch, Smalyshev, DarTar, leila, Aklapper, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, Wikidata-bugs, aude, Capt_Swing, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T183020: Investigate the possibility to release Wikidata queries

2018-03-26 Thread leila
leila added a comment.
@Smalyshev can you check my comment at https://meta.wikimedia.org/wiki/User_talk:Markus_Kr%C3%B6tzsch/Wikidata_queries and let me know if this is something your team is willing to pick up?TASK DETAILhttps://phabricator.wikimedia.org/T183020EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: leilaCc: mkroetzsch, Smalyshev, DarTar, leila, Aklapper, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, Wikidata-bugs, aude, Capt_Swing, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T183020: Investigate the possibility to release Wikidata queries

2018-03-19 Thread leila
leila added a comment.
I did one pass and left a couple of comments at https://meta.wikimedia.org/wiki/User_talk:Markus_Kr%C3%B6tzsch/Wikidata_queriesTASK DETAILhttps://phabricator.wikimedia.org/T183020EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: leilaCc: mkroetzsch, Smalyshev, DarTar, leila, Aklapper, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, Wikidata-bugs, aude, Capt_Swing, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T183020: Investigate the possibility to release Wikidata queries

2017-12-16 Thread mkroetzsch
mkroetzsch added a comment.
I agree with Stas: regular data releases are desirable, but need further thought. The task is easier for our current case since we already know what is in the data. For a regular process, one has to be very careful to monitor potential future issues. By releasing historic data, we avoid exploits that could be theoretically possible based on detailed knowledge of the methodology.

Regarding external IDs, one could whitelist unproblematic IDs that can be preserved, and obfuscate others. I agree that authority control IDs might identify humans, since they scope over so many things that are tied to particular humans (books, authors, etc.) that one could have a hypothetical situation where the interest in a particular item would already suggest who asked the query. I don't think something similar is even theoretically plausible for other IDs (e.g., for proteins or stars). Even for book ids, the lack of user traces makes it very hard to exploit this data further (the certainty you get from a single query being asked can hardly be high, and a query that helps you to guess who asked it will often not be interesting in its own right -- most likely you would want to know what else the identified person has asked). Anyway, we could restrict the "numerical strings are ok" rule to whitelisted properties for our current release. The main reason we have it at all are things like BlazeGraph's "radius" service parameter that have to be a number but are given as a string (I think the gas service might have similar cases).

There is a general limitation to potential exploits of SPARQL logs for breaching someone's privacy. If you don't control the software that formulated the query, then you can only connect queries to people if you already knew that only this person would ask this query. But then you learn very little by observing the query! On the other hand, if you control the software, then it would usually be easy to gather user data more directly, without needing the detour across some SPARQL logs released months later.  One exception that might be relevant in the future is the use of SPARQL from Lua built-ins or MediaWiki tags on Wikipedia pages, which could in theory expose some page traffic. This is not relevant for our historic logs, and it would be hard to fully exploit due to parser caches and crawler-based hits, but it might become a theoretical issue nonetheless. To avoid it, one could either filter all Wikipedia servers from the logs, or use a separate SPARQL service for such requests (as discussed in Berlin), whose logs would not be released.

Considering our current dataset, it seems that even the obfuscation of strings is more than one would have to do, but in the future one might indeed have to add external URLs if they become more common in queries.TASK DETAILhttps://phabricator.wikimedia.org/T183020EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, mkroetzschCc: mkroetzsch, Smalyshev, DarTar, leila, Aklapper, Lahi, Gq86, GoranSMilovanovic, QZanden, Avner, Wikidata-bugs, aude, Capt_Swing, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs