https://bugzilla.wikimedia.org/show_bug.cgi?id=58196
--- Comment #32 from Liangent <[email protected]> --- (In reply to Luis Villa (WMF Legal) from comment #31) > So, let's keep this bug focused on the question of the "anonymized" table; > i.e., the table that currently has most (but not all) rows, with userID > removed but last active timestamp available. > > The current proposal is to provide this table on labs. It has also been > suggested that: > > 1) we remove the whitelist altogether and provide everything; and/or > 2) we sample down the timestamp to YYYY-MM so that it is more difficult to > map "last active" to editors on small wikis. > > I still have a few concerns: > > 1) I still don't know how we created the original whitelist, or how we plan > to create the new one. (Note that if we trust our anonymization, having a > whitelist probably isn't necessary.) Note that some user_properties rows are not actually preferences. eg. watchlisttoken. > > 2) I'm still concerned about the impact on anonymity of small wikis. > Rounding to months helps, but could still be problematic in at least some > cases (as Krinkle pointed out). So ... wikis are already grouped as small, medium and large, then YYYYMMDD in large, YYYYMM in medium and YYYY in small wikis? https://noc.wikimedia.org/conf/highlight.php?file=large.dblist https://noc.wikimedia.org/conf/highlight.php?file=medium.dblist https://noc.wikimedia.org/conf/highlight.php?file=small.dblist > I do not have a great solution to these questions. Trying to > brainstorm/think out loud here: maybe we could provide aggregate totals for > all editors, for active editors, and for new editors (based on the standard > analytics definition of active editors/new editors), instead of access to > individually anonymized rows? -- You are receiving this mail because: You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
