A while ago there was a thread that devolved into "why does Tor still ship ancient privoxy?" and "why are you shipping polipo with the Tor Browser Bundle instead of current privoxy?" For those interested, the thread is here, http://archives.seul.org/or/talk/Jul-2009/msg00063.html.
Scott had a good argument for why we should update the bundles to the latest privoxy, and I agree, we should. But then I started thinking about why we needed a proxy at all. Almost all browsers support socks5 direct, isn't that faster than a middleman proxy? This got me thinking about why we put polipo in the TBB, but not the other packages. The TBB "feels faster" when using Tor than using the installed Tor, Vidalia, and Privoxy. However, I couldn't find any actual testing of performance of polipo vs. privoxy vs. socks5 direct. So I did it myself, in a loose manner. The raw data from Tamper Data as xml, proxy config files, and results in a spreadsheet are all contained in http://freehaven.net/~phobos/polipo-v-privoxy.tar.gz{.asc). And yes, the ruby script is a quick and dirty hack. I tested a few scenarios: 1) native polipo and privoxy without using Tor. 2) polipo and privoxy forwarding to Tor localhost:9050. 3) firefox socks5 direct to Tor via localhost:9050. The summary of results: 1) Native polipo is 54.5% faster on average than native privoxy. This could be due to polipo's caching, http 1.1 pipelining, and it can serve bits as fast as they come in from the network. Privoxy needs to load the whole page, scan it, and then send it to the client. Even if privoxy filtering is disabled, it still works the same way. 2) Polipo caching shines with Tor usage. Common images are cached, and served from the memory cache in single-digit millisecond ranges. Privoxy needs to wait for Tor to wholly deliver the bits. Caching is faster, this we know already. However, from a user perspective, it's just faster to load pages. 3) socks5 in Firefox 3.5.2 did better than I expected. 4) I tried testing a click to a second page to see how much polipo caching helps people reading different pages on the same site. It helps, but not as much as I expected. Caveats: Testing under tor is highly variable. I used the same circuits for both the polipo and privoxy tests to minimize variability. However, I can't control node load and congestion. Out of 23 get requests for the Torproject.org/index.html.en, 17 are for the country flags. Perhaps we should load these last at the bottom of the page, or do something else to speed up the torproject page load. As I was doing this, I kept thinking of other ways to do it better; 1) time requests and bits between tor, the http proxy, and the browser. How long does each request take to get from the browser, to the proxy, to tor and back across each layer? how much latency does each piece of software add to the request and delivery? 2) automate testing and let it run on a normal tor client over weeks. This will average out tor network variability and show "typical" user experience. 3) Pick a sampling of the top 100 websites by visits worldwide and measure their performance with the three methods, fully instrumented as in #1. 4) Do user experience measurements. Pay/ask/bribe people to sit in front of a computer, video record their browsing and feedback, and ask for a rating of each configuration (socks5, polipo, privoxy, and a placebo). 5) re-run #2 and run gcov to watch the code paths used in each piece of software, and figure out what can be optimized for performance. 6) test various "private browsing modes" through tor to see which browser is faster; firefox, safari, chromium, torfox, or torora. 7) how can we better tune polipo caching dynamically based on system ram config? Does having 1GB of cache provide significant benefits over the default? I'm sure there are lots of things wrong with my measurements, minimal analysis, and results. Constructive criticism is welcome. -- Andrew Lewman The Tor Project pgp 0x31B0974B Website: https://torproject.org/ Blog: https://blog.torproject.org/ Identi.ca: torproject