https://bugzilla.wikimedia.org/show_bug.cgi?id=58196

--- Comment #32 from Liangent <[email protected]> ---
(In reply to Luis Villa (WMF Legal) from comment #31)
> So, let's keep this bug focused on the question of the "anonymized" table;
> i.e., the table that currently has most (but not all) rows, with userID
> removed but last active timestamp available. 
> 
> The current proposal is to provide this table on labs. It has also been
> suggested that:
> 
> 1) we remove the whitelist altogether and provide everything; and/or
> 2) we sample down the timestamp to YYYY-MM so that it is more difficult to
> map "last active" to editors on small wikis.
> 
> I still have a few concerns:
> 
> 1) I still don't know how we created the original whitelist, or how we plan
> to create the new one. (Note that if we trust our anonymization, having a
> whitelist probably isn't necessary.)

Note that some user_properties rows are not actually preferences. eg.
watchlisttoken.

> 
> 2) I'm still concerned about the impact on anonymity of small wikis.
> Rounding to months helps, but could still be problematic in at least some
> cases (as Krinkle pointed out).

So ... wikis are already grouped as small, medium and large, then YYYYMMDD in
large, YYYYMM in medium and YYYY in small wikis?

https://noc.wikimedia.org/conf/highlight.php?file=large.dblist
https://noc.wikimedia.org/conf/highlight.php?file=medium.dblist
https://noc.wikimedia.org/conf/highlight.php?file=small.dblist

> I do not have a great solution to these questions. Trying to
> brainstorm/think out loud here: maybe we could provide aggregate totals for
> all editors, for active editors, and for new editors (based on the standard
> analytics definition of active editors/new editors), instead of access to
> individually anonymized rows?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to