On Tue, Dec 28, 2010 at 11:29 PM, Roger Dingledine <[email protected]> wrote: [snip] > What I'm really looking forward to is learning what modifications to Tor > might slow down the attack. For example, what happens if we move to a 1024 > byte cell by default, or if we randomly add some extra cells periodically, > or if we ask the entry node to add padding cells so the responses we get > are multiples of 10KB? It would seem that there is a tradeoff between > bandwidth overhead (wasted bytes) and protection against this attack, > but I hope there are smart points in the tradeoff space. Alas, we're > still not really to that point yet -- we don't know how well it actually > works in practice against vanilla Tor, so it doesn't make sense to ask > how well it would work in practice against a modified Tor design.
We can do some useful "back of the envelope calculations" so that we can say _something_ useful about the rounding. I spent a few minutes now contemplating this, and I thought I'd make the data available that I used for anyone else for anyone interested in studying this. http://myrandomnode.dyndns.org:8080/~gmaxwell/wp_article_sizes.txt.gz contains the uncompressed sizes of the wikitext for the 3.5 million English Wikipedia articles (as of Wikimedia's 2010-10 dump). Here is how we can use the data to reason about this attack: Assume that the attacker knows the target is browsing Wikipedia, and that they know the exact size of the pages loaded and want to know what articles the person is reading. Based on this data we can compute the entropy and to discover how much they will learn about each page load. We can then study how much quantization the size reduces entropy. Of course, attackers have a number of additional avenues to increase the usefulness of the data they obtain: They may have some assumptions about the prior probabilities (other than "user is browsing Wikipeda"), they may also reason about the interlinkedness of articles— e.g. a second page load is very likely a page linked from the first load. You might conservatively estimate that each and every request adds its total to the attackers aggregate knowledge. There are a number of limits to this line of study— Wikipedia articles are served in HTML form (not wikitext) and in the gzip encoding. I can wave my arms and say that I don't expect the conversion HTML and HTTPS transport to change the entropy much, and that I expect gzip to decrease it (because smaller sizes have intrinsically less entropy). Normally articles contain inline images— the loading sizes of these objects probably increase the entropy enormously. These probably aren't important compared to the fact that Wikipedia is not the whole internet. :) Still, it's a starting point. Here is some data, Using the James-Stein shrinkage estimate of entropy (which gives slightly larger results than the empirical entropy): log2(Cell size) Entropy in bits 0 13.48422 1 12.48014 2 11.47869 3 10.47837 4 9.478465 5 8.478762 6 7.480331 7 6.48253 8 5.493885 9 4.507543 10 3.526705 11 2.551070 12 1.599523 13 0.8287433 14 0.3627942 15 0.1329697 16 0.03448373 17 0.004374095 18 0.0002002991 19 1.336822e-05 20 6.684109e-06 (there is a single page of size zero, otherwise 20 would have 0 entropy. Over a real transport the size would never be zero, so a unit of 2^20 would be sufficient to reduce the leakage to zero for this data). So for this data, changing the transmission unit from 512 (4) to 1024 (5) would only decrease the information learned by an unbiased attacker from one request by one bit. (Unsurprisingly, the entropy of the pages sizes is not concentrated in the least significant bits) If you make any assumption that the attacker accumulates data from request to request (e.g. due to page linkage) then I think that a change from 512 to 1024 does not effectively thwart this attack against this data set. If the attacker does not have that ability then the current transmission unit already provides a substantial, and probably sufficient, reduction in information leaked. *********************************************************************** To unsubscribe, send an e-mail to [email protected] with unsubscribe or-talk in the body. http://archives.seul.org/or/talk/

