[tor-dev] Composing multiple pluggable transports

Ximin Luo Wed, 18 Jun 2014 08:04:31 -0700

Hi Steven, Nikita, I was told that you two are interested in the idea of 
composing multiple PTs together. Here are our ideas on it. We have a GSoC 
student, Quinn also at Illinois, working on turning this into reality.


## Concepts

On the most abstract level, pt-spec.txt defines an "input interface" to some 
generic component. It consists of the following:

- dest addr, of the Bridge
- headers/metadata, such as fingerprint[1] or other PT-level settings
- data, the actual application-layer stream, such as OR protocol

The concrete form of this is the SOCKS protocol, which allows tor to make a 
request with the above interface. (Actually, SOCKS does not fully support 
metadata, which means we've had to extend it ourselves. HTTP might have worked 
better.)

pt-spec.txt does not specify the "output interface". This means it's impossible 
to chain general PTs, because there's nothing defined to chain. To work around 
this, we observe that in practise, many PTs follow the below "output 
interface"; we'll call these "direct PTs":

- data, sent directly to the (TCP) address given at the input

Direct PTs include obfsproxy, scramblesuit, fteproxy.

Indirect PTs are all other PTs, that do something different other than a 
straight TCP connection *to the endpoint Bridge address*. These include 
flashproxy and meek.

## Design

Our combiner will chain up a sequence of direct PTs, then the last PT can be 
any PT (either direct or indirect). So for example it could potentially support 
obfs3|fte|fte|fte|flashproxy and obfs3|fte but not flashproxy|obfs3|obfs3|meek. 
Not every chain makes sense from a security viewpoint, of course.

Because the output interface (TCP) does not exactly match the input interface 
(SOCKS), we have a component called a "shim", which has an input interface of 
TCP and an output interface of SOCKS. This is placed between each pair of PTs 
in the chain. In simple terms, it works like this:

  pt0 (out) -TCP-> (in) shim1 (out) -SOCKS-to-next-shim-> pt1 -> to shim2

The extra info present in SOCKS (dest addr and metadata) absent from TCP, will 
be supplied by the combiner, as described in the next section.

Also, in practise, these shims are within the same process as the combiner, 
there is no need to start a new process for these.

## Algorithm

Let the ith PT be listening on port pPT[i], done at program start. Later, when 
tor wants to connect to a PT-chain, we intercept this connection, extracting 
the following information (the SOCKS in-interface):

- dest addr, of the Bridge
- headers. generic headers, plus PT-specific headers for each PT in the chain[2]
- data, OR protocol

Then, the combiner starts a new shim, one for each component in the chain, each 
listening on pS[i].

Each shim[i] is set-up so that when it receives a connection on pS[i], it tells 
PT[i] (i.e. the SOCKS client listening on pPT[i]) to connect to pS[i+1], with 
the metadata set to the generic headers plus specific headers for PT[i], and 
the data set to whatever it receives from the connection.

(In practise the shims are set-up in reverse order, because shim[i] needs to 
know what pS[i+1] is, and we want to take advantage of OS's feature to "listen 
on any port".)

Special cases:
- The last (i.e. (n-1)th) shim tells its SOCKS client to connect to the 
original dest addr, of the Bridge - there is no pS[last+1].
- The first shim does not need to exist, since the combiner is just sending 
data to itself so can do this in-process

After all the shims are set up, the combiner starts forwarding data from tor 
over onto PT[0]. Then the magic is complete.

ASCII diagram, minus annotations about metadata:

                         +----------+
        socks            |    PT    |
[ tor ]  to    >-------> | combiner |
        bridge       +-< |          |
                     |   +----------+
          in-process |
                     v      +-------+
                    socks   | direct| tcp
                     to >-> | PT[0] |  to  >-+
                    pS[1]   |       | pS[1]  |
                            +-------+        |
     +-------------------<-------------------+
     |                      +-------+
     |              socks   | direct| tcp
     +->[ shim[1] ]  to >-> | PT[1] |  to  >-+
                    pS[2]   |       | pS[2]  |
                            +-------+        |
     +-------------------<-------------------+
     |
     |
                        [etc]
                                             |
                                             |
     +-------------------<-------------------+
     |                      +-------+
     |              socks   | direct| tcp
     +->[ shim[y] ]  to >-> | PT[y] |  to  >-+
                    pS[z]   |       | pS[z]  |
                            +-------+        |
     +-------------------<-------------------+
     |                      +-------+
     |              socks   | any   |
     +->[ shim[z] ]  to >-> | PT[z] | whatever it wants -->
                    bridge  |       |
                            +-------+

Note that, if for whatever reason your chain has the same PT in multiple 
positions, the chain will re-use the same PT process. Everything should still 
work, because we have separate *shims* for each *position* that the PT appears 
within the chain.

X

[1] fingerprint not actually currently given to PTs by tor; but it should be 
for reasons argued elsewhere
[2] we haven't exactly defined a format for this, but perhaps something like 
k=v for generic headers to all children (as currently), and x-chain-0-k=v for 
PT-specific headers, and maybe even something like x-chain-0onwards-k=v.

-- 
GPG: 4096R/1318EFAC5FBBDBCE
git://github.com/infinity0/pubkeys.git

signature.asc
Description: OpenPGP digital signature

_______________________________________________
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

[tor-dev] Composing multiple pluggable transports

Reply via email to