[tor-commits] [tor-design-2012/master] Add local copy of the "top changes in tor" blogposts so we can work

nickm Fri, 09 Nov 2012 08:04:29 -0800

commit 0aa14b70a3a94ff06c1a0de53e55f2618c00d57b
Author: Nick Mathewson <[email protected]>
Date:   Fri Nov 9 11:03:52 2012 -0500


    Add local copy of the "top changes in tor" blogposts so we can work
    offline when integrating their content.
---
 blog/blogpost-1.txt |   49 +++++++++++++++++++++++++++++++++++++++++++++++++
 blog/blogpost-2.txt |   34 ++++++++++++++++++++++++++++++++++
 blog/blogpost-3.txt |   23 +++++++++++++++++++++++
 3 files changed, 106 insertions(+), 0 deletions(-)

diff --git a/blog/blogpost-1.txt b/blog/blogpost-1.txt
new file mode 100644
index 0000000..222c4a8
--- /dev/null
+++ b/blog/blogpost-1.txt
@@ -0,0 +1,49 @@
+The main academic reference for Tor is "Tor: The Second-Generation Onion 
Router" by Dingledine, Mathewson, and Syverson. But that paper was published 
back in 2004, and Tor has evolved since then. So Steven Murdoch and Nick 
Mathewson are currently preparing an updated version of the Tor design paper, 
to include new design changes and research results concerning Tor over the last 
8 years.
+In this series of posts, we (Steven and Nick) will try to summarize the most 
interesting or significant changes to Tor's design since the publication of the 
original paper. We're only going to cover the stuff we think is most 
interesting, and we'll aim to do so in an interesting way.
+We think this will be a three part series. In this first part, we'll cover the 
evolution of Tor's directory system, and some performance improvements in Tor's 
circuit creation and cell scheduling logic.
+1. Node discovery and the directory protocol
+Since the earliest versions of Tor, we knew that node discovery would require 
a better implementation than we had. There are a few key issues that any 
anonymity network's node discovery system needs to solve.
+Every client needs to be selecting nodes from the same probability 
distribution, or an adversary could be able to exploit differences in client 
knowledge. In the worst case, if an adversary can cause one client to know 
about an exit node that no other client uses, the adversary can know that all 
traffic leaving that exit is coming from the targeted client. But even in more 
mild cases, where (say) every client knows every node with P=90%, an adversary 
can use this information to attack users.
+While there has been some work on quantifying these so-called "epistemic 
attacks," we're proceeding with the conservative assumption that clients using 
separate sets of nodes are likely to be partitionable from one other, and so 
the set of nodes used by all clients needs to be uniform.
+The earliest versions of Tor solved this problem with a "Directory" object â 
each server generated a signed "router descriptor", and uploaded it to one of a 
small set (three) of "directory authorities". Each of these authorities 
generated a signed concatenated list of router descriptors, and served that 
list to clients over HTTP.
+This system had several notable problems:
+Clients needed to download the same descriptors over and over, whether they 
needed them or not.
+Each client would believe whichever directory authority it had spoken to most 
recently: rather than providing distributed trust, each authority was fully 
trusted in its own right, and any one misbehaving authority could completely 
compromise all the clients that talked to it.
+To the extent that directory authorities disagreed, they created partitions in 
client knowledge, which could in the worst case allow an adversary to partition 
clients based on which authority's directory each client had downloaded most 
recently.
+The load on the authorities promised to grow excessive, as every client 
contacted every authority.
+The contents of the directory were sent unencrypted, which made them trivially 
easy to fingerprint on the wire.
+Early changes in the Version 1 Directory System
+Our earliest changes focused on improving scalability rather than improving 
the trust model. We began by having each authority publish two documents, 
rather than one: a directory that contained all the router descriptors, and a 
"network status" document that was a list of which descriptors were up and 
which were down (Tor 0.0.8pre1, in Jul 2004). Clients could download the latter 
more frequently to avoid using down servers, and refresh the former only 
periodically.
+We also added a caching feature in Tor 0.0.8pre1, where nodes could act as 
directory caches. With this feature, once a client had a directory, it no 
longer needed to contact the directory authorities every time it wanted to 
update it, but rather could contact a cache instead.
+The Version 2 Directory System (deprecated)
+In Tor 0.1.1.8-alpha (Oct 2005), we took our first shot at solving the trust 
model. Now, we had each authority sign a more complete network status 
statement, including a list of all the nodes that it believed should be in the 
network, a digest of each node's public key, and a digest of each node's router 
descriptor. Clients would download one of these documents signed by each 
authority, and then compute, based on all the documents they had, which 
collection of nodes represented the consensus view of all the authorities.
+To get router descriptors, clients would then contact one or more caches, and 
ask for the descriptors they believed represented the consensus view of all the 
authorities.
+This approach meant that a single rogue directory authority could no longer 
completely control any client's view of the network. Clients became more 
fragmented, however, since instead of falling into one of N possible groups 
based on which authority they contacted most recently, they fell into one of MN 
groups where N was the number of authorities and M was the number of recently 
valid opinions from each authority.
+Around this time we also had authorities begin assigning flags to nodes, so 
that in addition to recording "up" or "down" for each node, authorities could 
also declare whether nodes were fast, stable, likely to be good guard nodes, 
and so forth.
+All of the improvements so far were oriented toward saving bandwidth at the 
server side: we figured that clients had plenty of bandwidth, and we wanted to 
avoid overloading the authorities and caches. But if we wanted to add more 
directory authorities (a majority of 5 is still an uncomfortably small number), 
bootstrapping clients would have to fetch one more network status for every new 
authority. By early 2008, each status document listed 2500 relay summaries and 
came in around 175KB compressed, meaning you needed 875KB of status docs when 
starting up, and then another megabyte of descriptors after that. And we 
couldn't add more authorities without making the problem even worse.
+Version 3: Consensus and Directory Voting
+To solve the problems with the v2 directory protocol, Tor 0.2.0.3-alpha (Jul 
2007) introduced a directory voting system, where the authorities themselves 
would exchange vote documents periodically (currently once per hour), compute a 
consensus document based on everyone's votes, and all sign the consensus.
+Now clients only need to download a single signed consensus document 
periodically, and check that it is signed by a sufficiently large fraction of 
the authorities that the client knows about. This gives clients a uniform view 
of the network, makes it harder still for a small group of corrupt authorities 
to attack a client, and limits the number of documents they need to download.
+The voting algorithm is ad hoc, and is by no means the state of the art in 
byzantine fault tolerance. Our approach to failures to reach a consensus (which 
have been mercifully infrequent) is to find what's wrong and fix it manually.
+Saving bytes with microdescriptors
+Now that the consensus algorithm finally matched what we had in mind when we 
wrote the first Tor paper, it was time to address the protocol's verbosity.
+Out of all the data in a typical 1500-byte server descriptor, a client really 
only needs to know what ports it supports exiting to, its current RSA1024 onion 
key, and other information that is fully redundant with its listing in the 
consensus network status document.
+One crucial observation is that signatures on the router descriptors 
themselves don't actually help clients: if there were enough hostile 
authorities to successfully convince the clients to use a descriptor that a 
router didn't actually sign, they could as easily convince the clients to use a 
descriptor signed with a phony identity key.
+This observation let us move (in Tor 0.2.3.1-alpha, May 2011) to a design 
where the authorities, as part of their voting process, create an abbreviated 
version of each descriptor they recommend. Currently, these contain only a 
short summary of the router's exit policy, and the router's current onion key. 
Clients now download these abbreviated "microdescriptors", which cuts the 
information downloaded each node by about 75%. Further, because the data here 
change relatively infrequently, it cuts down the frequency with which clients 
fetch new information about each router at all.
+Tunneling directory connections over Tor
+In 0.1.2.5-alpha (Jan 2007), we added support by default for clients to 
download all directory documents over HTTP over Tor, rather than by contacting 
directories and caches over unencrypted HTTP. This change helps clients resist 
fingerprinting.
+Because clients aren't using Tor for anonymity on directory connections, they 
build single-hop circuits. We use HTTP over a one hop Tor circuit, rather than 
plain old HTTPS, so that clients can use the same TLS connection both for a 
directory fetch and for other Tor traffic.
+2. Security improvements for hidden services
+Decentralized hidden-service directory system
+A partly-centralized directory infrastructure makes sense for Tor nodes, since 
every client is supposed to be able to know about every node, but it doesn't 
make a great deal of sense for hidden services.
+To become more censorship-resistant, we started (in Tor 0.2.0.10-alpha, Nov 
2007) to instead use the Tor network itself to cache and serve hidden service 
descriptors. Now, instead of publishing their hidden service descriptors 
anonymously to a small fixed set of hidden service authorities, hidden services 
publish to a set of nodes whose identity keys are closest to a hash of the 
service's identity, the current date, and a replica number.
+Improved authorization model for hidden services
+We also added improved support for authentication to hidden services. 
Optionally, to use a hidden service, a client must know a shared key, and use 
this key to decrypt the part of a hidden service descriptor containing the 
introduction points. It later must use information in that encrypted part to 
authenticate to any introduction point it uses, and later to the hidden service 
itself. One of the main uses of authentication here is to hide presence -- only 
authenticated users can learn whether the hidden service is online.
+3. Faster first-hop circuit establishment with CREATE_FAST
+At each stage of extending a circuit to the next hop, the client carries out a 
Diffie-Hellman (DH) key agreement protocol with that next hop. This step 
provides confidentiality (and forward secrecy) of the relay cell payload as it 
is passed on by intermediate hops. Originally Tor also carried out DH with the 
first hop, even though there was already a DH exchange as part of the TLS 
handshake. DH is quite computationally expensive for both ends, so Tor 
0.1.1.10-alpha (Dec 2005) onwards skipped the DH exchange on the first hop by 
sending a CREATE_FAST (as opposed to a standard CREATE) cell, which generates 
key material simply by hashing random numbers sent by the client and server.
+4. Cell queueing and scheduling
+The original Tor design left the fine-grained handling of incoming cells 
unspecified: every circuit's cells were to be decrypted and delivered in order, 
but nodes were free to choose which circuits to handle in any order they 
pleased.
+Early versions of Tor also punted on the question: they handled cells in the 
order they were received on incoming OR connections, encrypting/decrypting them 
and handing them off immediately to the next node on the circuit, or to the 
appropriate exit or entry connection. This approach, however, frequently 
created huge output buffers where quiet circuits couldn't get a cell in 
edgewise.
+Instead, Tor currently places incoming cells on a per-circuit queue associated 
with each circuit. Rather than filling all output buffers to capacity, Tor 
instead fills them up with cells on a near just-in-time basis.
+When we first implemented these cell queues in 0.2.0.1-alpha (Jun 2007), we 
chose which cells to deliver by rotating the circuits in a round-robin 
approach. In Tor 0.2.2.7-alpha (Jan 2010), we began to instead favor the 
circuits on each connection that had been quiet recently, so that a circuit 
with small, infrequent amounts of cells will get better latency than a circuit 
being used for a bulk transfer. (Specifically, when we are about to put a cell 
on an outgoing connection, we choose the circuit which has sent the lowest 
total exponentially-decaying number of cells so far. Currently, each cell has a 
30-second half-life.)
+In Part 2 we will look at changes to how Tor selects paths and the new 
anti-censorship measures.
diff --git a/blog/blogpost-2.txt b/blog/blogpost-2.txt
new file mode 100644
index 0000000..eb9c416
--- /dev/null
+++ b/blog/blogpost-2.txt
@@ -0,0 +1,34 @@
+This is part 2 of Nick Mathewson and Steven Murdoch's series on what has 
changed in Tor's design since the original design paper in 2004. Part one is 
back over here.
+In this installment, we cover changes in how we pick and use nodes in our 
circuits, and general anticensorship features.
+5. Guard nodes
+We assume, based on a fairly large body of research, that if an attacker 
controls or monitors the first hop and last hop of a circuit, then the attacker 
can de-anonymize the user by correlating timing and volume information. Many of 
the security improvements to path selection discussed in this post concentrate 
on reducing the probability that an attacker can be in this position, but no 
reasonably efficient proposal can eliminate the possibility.
+Therefore, each time a user creates a circuit, there is a small chance that 
the circuit will be compromised. However, most users create a large number of 
Tor circuits, so with the original path selection algorithm, these small 
chances would build up into a potentially large chance that at least one of 
their circuits will be compromised.
+To help improve this situation, in Tor 0.1.1.2-alpha, the guard node feature 
was implemented (initially called "helper nodes", invented by Wright, Adler, 
Levine, and Shields and proposed for use in Tor by Ãverlier and Syverson). In 
Tor 0.1.1.11-alpha it was enabled by default. Now, the Tor client picks a few 
Tor nodes as its "guards", and uses one of them as the first hop for all 
circuits (as long as those nodes remain operational).
+This doesn't affect the probability that the first circuit is compromised, but 
it does mean that if the guard nodes chosen by a user are not 
attacker-controlled all their future circuits will be safe. On the other hand, 
users who choose attacker-controlled guards will have about M/N of their 
circuits compromised, where M is the amount of attacker-controlled network 
resource and N is the total network resource. Without guard nodes every circuit 
has a (M/N)2 probability of being compromised.
+Essentially, the guard node approach recognises that some circuits are going 
to be compromised, but it's better to increase your probability of having no 
compromised circuits at the expense of also increasing the proportion of your 
circuits that will be compromised if any of them are. This is because 
compromising a fraction of a user's circuitsâsometimes even just oneâcan be 
enough to compromise a user's anonymity. For users who have good guard nodes, 
the situation is much better, and for users with bad guard nodes the situation 
is not much worse than before.
+6. Bridges, censorship resistance, and pluggable transports
+While Tor was originally designed as an anonymous communication system, more 
and more users need to circumvent censorship rather than to just preserve their 
privacy. The two goals are closely linked â to prevent a censor from blocking 
access to certain websites, it is necessary to hide where a user is connecting 
to. Also, many censored Internet users live in repressive regimes which might 
punish people who access banned websites, so here anonymity is also of critical 
importance.
+However, anonymity is not enough. Censors can't block access to certain 
websites browsed over Tor, but it was easy for censors to block access to the 
whole of the Tor network in the original design. This is because there were a 
handful of directory authorities which users needed to connect to before they 
could discover the addresses of Tor nodes, and indeed some censors blocked the 
directory authorities. Even if users could discover the current list of Tor 
nodes, censors also blocked the IP addresses of all Tor nodes too.
+Therefore, the Tor censorship resistance design introduced bridges â special 
Tor nodes which were not published in the directory, and could be used as entry 
points to the network (both for downloading the directory and also for building 
circuits). Users need to find out about these somehow, so the bridge authority 
collects the IP addresses and gives them out via email, on the web, and via 
personal contacts, so as to make it difficult for the censor to enumerate them 
all.
+That's not enough to provide censorship resistance though. Preventing the 
censor from knowing all the IP addresses they need to block to block access to 
the Tor network will be enough to defeat some censors. But others have the 
capability to block not only by IP address but also by content (deep packet 
inspection). Some censors have tried to do this already and Tor has, in 
response, gradually changed its TLS handshake to better imitate web browsers.
+Impersonating web browsers is difficult, and even if Tor perfectly 
impersonated one, some censors could just block encrypted web browsing (like 
Iran did, for some time). So it would be better if Tor could impersonate 
multiple protocols. Even better would be if other people could contribute to 
this goal, rather than the Tor developers being a bottleneck. This is the 
motivation of the pluggable transports design which allows Tor to manage an 
external program which transforms Tor traffic into some hard-to-fingerprint 
obfuscation.
+7. Changes and complexities in our path selection algorithms
+The original Tor paper never specified how clients should pick which nodes to 
use when constructing a circuit through the network. This question has proven 
unexpectedly complex.
+Weighting node selection by bandwidth
+The simplest possible approach to path construction, which we used in the 
earliest versions of Tor, is simply to pick uniformly at random from all 
advertised nodes that could be used for a given position in the path. But this 
approach creates terrible bandwidth bottlenecks: a server that would allow 10x 
as many bytes per second as another would still get the same number of circuits 
constructed through it.
+Therefore, Tor 0.0.8rc1 started to have clients weight their choice of nodes 
by servers' advertised bandwidths, so that a server with 10x as much bandwidth 
would get 10x as many circuits, and therefore (probabilistically) 10x as much 
of the traffic.
+(In the original paper, we imagined that we might take Morphmix's approach, 
and divide nodes into "bandwidth classes", such that clients would choose only 
from among nodes having at least the same approximate bandwidth as the clients. 
This may be a good design for peer-to-peer anonymity networks, but it doesn't 
seem to work for the Tor network: the most useful high-capacity nodes have more 
capacity than nearly any typical client.)
+Later, it proved that weighting by bandwidth was also suboptimal, because of 
nonuniformity in path selection rules. Consider that if node A is suitable for 
use at any point in a circuit, but node B is suitable only as the middle node, 
then node A will be considered for use three times as often as B. If the two 
nodes have equal bandwidth, node A will be chosen three times as often, leading 
to it being overloaded in comparison with B. So eventually, in Tor 
0.2.2.10-alpha, we moved to a more sophisticated approach, where nodes are 
chosen proportionally to their bandwidth, as weighted by an algorithm to 
optimize load-balancing between nodes of different capabilities.
+Bandwidth authorities
+Of course, once you choose nodes with unequal probability, you open the 
possibility of an attacker trying to see a disproportionate number of circuits 
-- not by running an extra-high number of nodes -- but by claiming to have a 
very large bandwidth.
+For a while, we tried to limit the impact of this attack by limiting the 
maximum bandwidth that a client would believe, so that a single rogue node 
couldn't just claim to have infinite bandwidth.
+In 0.2.1.17-rc, clients switched from using bandwidth values advertised by 
nodes themselves to using values published in the network status consensus 
document. A subset of the authorities measure and vote on nodes' observed 
bandwidth, to prevent misbehaving nodes from claiming (intentionally or 
accidentally) to have too much capacity.
+Avoiding duplicate families in a single circuit
+As mentioned above, if the first and last node in a circuit are controlled by 
an adversary, they can use traffic correlation attacks to notice that the 
traffic entering the network at the first hop matches traffic leaving the 
circuit at the last hop, and thereby trace a client's activity with high 
probability. Research on preventing this attack has not yet come up with any 
affordable, effective defense suitable for use in a low-latency anonymity 
network. Therefore, the most promising mitigation strategies seem to involve 
lowering the attacker's chances of controlling both ends of a circuit.
+To this end, clients do not use any two nodes in a circuit whose IP addresses 
are in the same /16 â when we designed the network, it was marginally more 
difficult to acquire a large number of disparate addresses than it was to get a 
large number of concentrated addresses. (Roger and Nick may have been 
influenced by their undergraduacy at MIT, where their dormitory occupied the 
entirety of 18.244.0.0/16.) This approach is imperfect, but possibly better 
than nothing.
+To allow honest node operators to run more than one server without 
inadvertently giving themselves the chance to see more traffic than they 
should, we also allow nodes to declare themselves to be members of the same 
"family", such that a client won't use two nodes in the same family in the same 
circuit. (Clients only believe mutual family declarations, so that an adversary 
can't capture routes by having his nodes claim unilaterally to be in a family 
with every node the adversary doesn't control.)
+8. Stream isolation
+Building a circuit is fairly expensive (in terms of computation and bandwidth) 
for the network, and the setup takes time, so the Tor client tries to re-use 
existing circuits if possible, by sending multiple TCP streams down them. 
Streams which share a circuit are linkable, because the exit node can tell that 
they have the same circuit ID. If the user sends some information on one stream 
which gives their identity away, the other streams on the same circuit will be 
de-anonymized.
+To reduce the risk of this occurring, Tor will not re-use a circuit which the 
client first used more than 10 minutes ago. Users can also use their Tor 
controller to send the "NEWNYM" signal, preventing any old circuits being used 
for new streams. As long as users don't mix anonymous and non-anoymous tasks at 
the same time, this form of circuit re-use is probably a good tradeoff.
+However, Manils et al. discovered that some Tor users simultaneously ran 
BitTorrent over the same Tor client as they did web browsing. Running 
BitTorrent over Tor is a bad idea because the network can't handle the load, 
and because BitTorrent packets include the user's real IP address in the 
payload, so it isn't anonymous. But running BitTorrent while doing anonymous 
web browsing is an especially bad idea. An exit node can find the user's IP 
address in the BitTorrent payload then trivially de-anonymize all streams 
sharing the circuit.
+Running BitTorrent over Tor is still strongly discouraged, but this paper did 
illustrate some potential problems with circuit reuse so proposal 171 was 
written, and implemented in Tor 0.2.3.3-alpha, to help isolate streams which 
shouldn't share the same circuit. By default streams which were initiated by 
different clients, which came from SOCKS connections with different 
authentication credentials, or which came to a different SOCKS port on the Tor 
client, are separated. In this way, a user can isolate applications by either 
setting up multiple SOCKS ports on Tor and using one per application, or by 
setting up a single SOCKS port but using different username/passwords for each 
application. Tor can also be configured to isolate streams based on destination 
address and/or port.
diff --git a/blog/blogpost-3.txt b/blog/blogpost-3.txt
new file mode 100644
index 0000000..b76a840
--- /dev/null
+++ b/blog/blogpost-3.txt
@@ -0,0 +1,23 @@
+In this third and final installment of Nick Mathewson and Steven Murdoch's 
blog series (previously part 1 and part 2) we discuss how Tor has made its 
traffic harder to fingerprint, as well as usability and security improvements 
to how users interact with Tor.
+9. Link protocol TLS, renegotiation
+Tor's original (version 1) TLS handshake was fairly straightforward. The 
client said that it supported a sensible set of cryptographic algorithms and 
parameters (ciphersuites, in TLS terminology) and the server selected one. If 
one side wanted to prove to the other that it was a Tor node, it would send a 
two-element certificate chain signed by the key published in the Tor directory.
+This approach met all the security properties envisaged at the time the 2004 
design paper was written, but Tor's increasing use in censorship resistance 
changed the requirements â Tor's protocol signature also had to look like 
that of HTTPS web traffic, to prevent censors using deep-packet-inspection to 
detect and block Tor.
+It turned out that Tor's original design looked very different from HTTPS. 
Firstly, web browsers offer a wide range of ciphersuites which Tor cannot use, 
such as those using RC4 (due to the narrow security margins) and RSA key 
exchange (due to lack of forward secrecy). Secondly, in HTTPS web traffic, the 
client seldom offers a certificate, and the server usually offers a one-element 
certificate chain, whereas in Tor node-to-node communication both sides offer a 
two-element certificate chain.
+Therefore proposal 124, later superseded by proposal 130, tried to resolve the 
situation and the resulting version 2 connection protocol was implemented in 
Tor 0.2.0.20-rc. Here, the client presents a large selection of ciphersuites 
(including some it doesn't actually support), selected to appear similar to 
that of a web browser. The server then chooses one which is suitable for use in 
Tor, but if the server chooses one which is not adequately secure, the client 
will pull down the connection.
+To make the certificate part of the handshake look closer to HTTPS, the client 
sends no certificate, and the server sends a one-element dummy certificate 
chain. The certificate offered by the server is designed to not contain 
distinctive strings which could be used for blocking (version 1 certificates 
used "Tor" or "TOR" as the organization name). Once the handshake is complete, 
Tor then restarts the handshake (via TLS renegotiation), but now encrypted 
under the keys established in the first handshake, and sends the two-element 
certificate chains as before.
+This improves the situation for anti-blocking considerably, although more 
could still be done. In particular, the fact that renegotiation is occurring is 
not hidden from an observer because the type of TLS messages (known as records) 
is not encrypted in TLS, and renegotiation records are of a different type from 
data records. Therefore version 3 of the connection protocol, described in 
proposal 176 and implemented in Tor 0.2.3.6-alpha, moves the second stage of 
the handshake into data records, binding the inner to the outer handshake 
through sharing some key material.
+10. Rise and fall of .exit
+In Tor 0.0.9rc5, Tor had the .exit feature added. Here, if the user requested 
domain.nickname.exit then Tor would make a connection to domain using the Tor 
node called nickname as the last hop (if possible). This was a convenient 
feature for exploring how the Internet looked from different locations, but it 
also raised some security concerns.
+In particular, a malicious website could embed an image with a .exit hostname, 
forcing the Tor client to select an attacker-controlled exit node. Then, if the 
user also chooses an attacker-controlled entry node the circuit could be 
de-anonymized. This strategy increases the probability of a successful attack 
from about (M/N)2 to M/N (where M is the amount of attacker-controlled network 
resource and N is the total network resource).
+Therefore, in Tor 0.2.2.1-alpha, .exit notation was disabled by default. In 
Tor 0.2.3.17-beta an exception was made, allowing .exit notation when it is 
specified in the configuration file or by a controller. These sources are 
assumed to be safe, and by combining the .exit notation with the MapAddress 
option it is possible for the client to always contact some domain names via a 
particular exit node. This is useful when a service is running on the same 
machine as a Tor node, as then the user can choose for circuits to never leave 
the Tor network.
+11. Controller protocol
+Tor has always had a minimalist user interface â it can be configured on the 
command line or a configuration file and sends output to a log file. This is 
fine for advanced users, but most users will prefer a GUI. Building a GUI into 
Tor would be difficult, and would force certain choices (e.g. GUI toolkit) to 
be made which might not suit all users and all platforms. Therefore the 
approach taken by Tor in 0.0.9pre5 is to build an interface for other programs 
â the control protocol â to communicate with the Tor daemon, extracting 
information to display on the GUI and changing the Tor configuration based on 
user actions.
+The control protocol has also proven useful to researchers experimenting with 
Tor. Initially the functionality exposed in the control protocol was simply 
that exposed by the configuration file and log files. Providing status 
information in a specified and machine-readable format made the task of 
monitoring and controlling Tor easier. Later, functionality was added to the 
control protocol which should not be exposed to ordinary Tor users but is 
useful to researchers, such as allowing controllers to arbitrarily control the 
path selection process (added in 0.1.0.1-rc).
+In 0.1.1.1-alpha the protocol was changed to version 1, which used ASCII 
rather than binary commands to make it easier to write and debug controllers as 
well as allow advanced users to telnet into the control port and manually type 
commands.
+12. Torbutton
+The 2004 design paper stated that Tor explicitly did not make any attempt to 
scrub application data which might contain identifying information. By adopting 
the near universal SOCKS protocol, almost any application could send its 
traffic over Tor, but there was no guarantee it would be safe to do so. This is 
in contrast to the the predecessors to Tor from the Onion Routing project which 
required an "application proxy" to be written for each protocol carried by Tor. 
These proxies greatly increased the cost for supporting each additional 
application.
+Still, there was clear need for a place to perform the protocol scrubbing, and 
so Tor recommended that Privoxy take the place of an application proxy for 
HTTP. However, the disadvantages of this approach gradually became clear, in 
particular Privoxy could not inspect or modify HTTPS traffic and so malicious 
websites could send their tracking code over HTTPS and avoid scrubbing.
+Therefore, more and more of the scrubbing was performed by a Firefox add-on, 
Torbutton, which also could turn Tor on and off â hence the name. Torbutton 
had full access to content regardless of whether it was HTTP or HTTPS and could 
also disable features of Firefox which were bad for privacy. A proxy was still 
needed though, because Firefox's SOCKS support handled high-latency connections 
badly, so the lighter-weight Polipo was adopted instead.
+13. Tor Browser Bundle
+Now to use Tor, most users would need to download and install Tor, Firefox, 
Torbutton and Polipo, probably along with a GUI controller such as Vidalia. 
This was inconvenient, especially for customers of Internet cafes who could not 
install software on the computer they were using. So the Tor Browser Bundle was 
created which included all this software, pre-configured to be run from a USB 
drive.
+This was far easier to use than the previous way to install Tor, and 
eventually became the default. It had the added advantage that we could modify 
the browser to include patches which made Polipo unnecessary and to fix some 
privacy problems which could not be solved from within a Firefox add-on. It was 
also safer for users because now Torbutton could not be disabled, meaning that 
the user had different web browsers for anonymous and non-anonymous browsing 
and were less likely to muddle up the two.

_______________________________________________
tor-commits mailing list
[email protected]
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-commits

[tor-commits] [tor-design-2012/master] Add local copy of the "top changes in tor" blogposts so we can work

Reply via email to