Smalyshev added a comment.

I've reviewed the proposal and it looks good to me. In fact, it aligns with my thinking about how we should make data from SPARQL logs available (see T143819: Data request for logs from SparQL interface at query.wikidata.org), though I think we should eventually find a way to provide such data sets on regular basis. I'll add more of my thinking on the subject to T143819, but so far I think it's fine.

One further thing to consider (probably not for current data set but for applying it to further applications) is that data can be not only strings, but also numeric values and URIs. And in the future - e.g. for #structured-data-commons project - we could have URIs that identify particular users (e.g. as author of the content in Commons). Also, right now (though probably not in data set in question) we produce both string and URI for external ID. While in most cases I can think of external ID is not PII - after all, it's an ID specifically invented to be a public identifier - it could potentially reveal some information, such as people looking for a particular book. So if we want to apply the same process for future data sets, we need to be aware of it and think about solutions.


TASK DETAIL
https://phabricator.wikimedia.org/T183020

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: mkroetzsch, Smalyshev, DarTar, leila, Aklapper, Lahi, Gq86, GoranSMilovanovic, QZanden, Avner, Wikidata-bugs, aude, Capt_Swing, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to