On Tue, Jan 25, 2011 at 12:12:45PM +0100, Karsten Loesing wrote: > Hi everyone, > > we're pondering to publish the information which distribution pool a > bridge is assigned to. The distribution pool defines whether we're giving > out bridges via HTTP, via email, or not at all (reserved pool). The plan > is to remove all sensitive information from bridge pool assignments before > making them available on https://metrics.torproject.org/data.html. > > For the long version see task 2372 and comments: > > https://trac.torproject.org/projects/tor/ticket/2372 > > For the summary version read on: > > We want to make sanitized bridge pool assignments available, so that we > can answer questions like these: > > - What's the correlation between which pool the bridge is in and whether > that bridge sees a lot of use from a given country? > > - Is bridge uptime affected by the pool assignment, because operators of > bridges in the reserved pool decide that their bridge is not useful? > > Here's a proposed data format for bridge pool assignments: > > bridge-pool-assignment 2011-01-10 01:41:14 > b 127.0.0.1:443 abcdef0123456789abcdef0123456789abcdef01 > b 127.0.0.1:443 0123456789abcdef0123456789abcdef01234567 > s IP ring 1 (port-443 subring) > s IP ring 1 (stable subring) > s IP ring 1 > > The timestamp in the bridge-pool-assignment line is the time when the > assignment is written to disk (twice an hour). Lines starting with b > contain IP address, port, and fingerprint of a bridge. For sanitizing > purposes, we replace bridge IP addresses with 127.0.0.1 and bridge > identities with their SHA-1 hashes. That's the same approach that we take > for sanitizing bridge descriptors. Lines starting with s contain the > rings or subrings that a bridge is allocated to. If a bridge is not > assigned to any pool, it doesn't have an s line. > > While this information is useful for analysis, we need to be aware that > these lists can be misused by a censor to learn what fraction of bridges > is contained in which pool and what percentage of bridges of a given pool > they can block. So far, they can only tell how many bridges there are in > total and what fraction of these bridges they know. We'll have to decide > if the questions we expect to answer using these data are worth it.
Here's a sample bridge pool assignment from September 2010 that is sanitized as described above (all IP addresses set to 127.0.0.1, contained fingerprints are SHA-1 hashes of the original fingerprints): http://freehaven.net/~karsten/volatile/bridge-pool-assignment-sample This sample is there, so that everyone gets a better idea of what is meant by a bridge pool assignment. Does anyone object to publishing tarballs of these sanitized bridge pool assignments on the metrics website, so that we (and anyone else) can analyze them? Best, Karsten