On 11/06/13 10:41, Anthony wrote: > One thing I'd also appreciate is that if indeed Wikipedia access logs are > not even collected in the first place (except for 1/1000 samples), that > this be stated officially, rather than relying on a two-year-old comment by > a single, now-former employee.
In October 2012, I introduced an unsampled log of API requests, including IP addresses. This was in response to a server overload caused by the API which was very difficult to isolate due to the lack of meaningful logs. The retention time is currently 30 days. This means that, among other things, search autocomplete is logged. The logs are collected at the backend, which means that Squid cache hits will not be logged. So autocomplete requests for common terms and prefixes will appear rarely. This is not a secret -- the changes that made it happen were public at the time: https://gerrit.wikimedia.org/r/#/c/24274/ https://gerrit.wikimedia.org/r/#/c/26434/ I'm sure that the other teams (e.g. fundraising, mobile and analytics) can give you details of what access logs they collect and store. In general, access logs haven't been stored due to cost, rather than for any privacy reason. Lots of smaller services (e.g. blog.wikimedia.org) store access logs. -- Tim Starling _______________________________________________ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l