David Goulet <dgou...@torproject.org> writes: > On 16 May (14:20:05), George Kadianakis wrote: > > Hello! > >> 4.1. A dive into general circuit construction sequences [CIRCCONSTRUCTION] >> >> In this section we give an overview of how circuit construction looks like >> to a network or guard-level adversary. We use this knowledge to make the >> right padding machines that can make intro and rend circuits look like >> these >> general circuits. >> >> In particular, most general Tor circuits used to surf the web or download >> directory information, start with the following 6-cell relay cell >> sequence (cells >> surrounded in [brackets] are outgoing, the others are incoming): >> >> [EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [BEGIN] -> CONNECTED >> >> When this is done, the client has established a 3-hop circuit and also >> opened a stream to the other end. Usually after this comes a series of >> DATA >> cell that either fetches pages, establishes an SSL connection or fetches >> directory information: >> >> [DATA] -> [DATA] -> DATA -> DATA >> >> The above stream of 10 relay cells defines the grand majority of general >> circuits that come out of Tor browser during our testing, and it's what we >> are gonna use to make introduction and rednezvous circuits blend in. > > Considering "either fetches pages,..." is in the description, I'm confused how > only 2 data cells is the grand majority? > > A simple "wget torproject.org" gives me an index.html of 16KB meaning at least > 32 DATA cells. Even a directory fetch can't only be 2 data cells... ? >
Perhaps I should have made it more clear but the pattern: [DATA] -> [DATA] -> DATA -> DATA -> ... comes from the SSL handshake that happens in most general circuits. In particular the first two [DATA] cells are the ClientHello etc. SSL records that get sent by the client, and then the subsequence DATA cells are the ServerHello etc. of the server. >> 5.1. Client-side introduction circuit hiding machines [INTRO_CIRC_HIDING] >> >> These two machines are meant to hide client-side introduction circuits. >> The >> origin-side machine sits on the client and sends padding towards the >> introduction circuit, whereas the relay-side machine sits on the >> middle-hop >> (second hop of the circuit) and sends padding towards the client. The >> padding from the origin-side machine terminates at the middle-hop and does >> not get forwarded to the actual introduction point. >> >> Both of these machines only get activated for introduction circuits, and >> only after an INTRODUCE1 cell has been sent out. >> >> This means that before the machine gets activated our cell flow looks >> like this: >> >> [EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [EXTEND2] -> >> EXTENDED2 -> [INTRODUCE1] >> >> Comparing the above with section [CIRCCONSTRUCTION], we see that the above >> cell sequence matches the one from general circuits up to the first 7 >> cells. >> >> However, in normal introduction circuits this is followed by an >> INTRODUCE_ACK and then the circuit gets teared down, which does not match >> the sequence from [CIRCCONSTRUCTION]. >> >> Hence when our machine is used, after sending an [INTRODUCE1] cell, we >> also >> send a [PADDING_NEGOTIATE] cell, which gets answered by a >> PADDING_NEGOTIATED >> cell and an INTRODUCE_ACKED cell. This makes us match the >> [CIRCCONSTRUCTION] >> sequence up to the first 10 cells. >> >> After that, we continue sending padding from the relay-side machine so as >> to >> fake a directory download, or an SSL connection setup. We also want to >> continue sending padding so that the connection stays up longer to destroy >> the "Duration of Activity" fingerprint. > > I've looked at the implementation quickly and these DROP cells aren't > accounted for in our circuit flow control which means that there will be a > difference between a "real" DATA circuit and a circuit being sent PADDING in > order to look like the former. And that will be the flow control cell(s) > (SENDME) coming back from the end point that is receiving the data. > > In other words, one circuit (the padded one) will have only a long stream of > cells going in one direction and the second circuit (with legit data) will > have that long stream but now and then a cell coming back down the circuit. > > I believe this is quite the distinguisher between any circuit seeing much > padding and one that doesn't? :S > I think you are right, but I dont think that these padded intro circuits will stay open for long enough to need a SENDME cell from the client to the relay. In particular, the client will receive about 15 cells before the intro circuit gets teared down. >> >> To calculate the padding overhead, we see that the origin-side machine >> just >> sends a single [PADDING_NEGOATIATE] cell, wheras the origin-side machine > > Typo here "PADDING_NEGOATIATE". > Yep. Will fix soon. >> sends a PADDING_NEGOTIATED cell and between 7 to 10 DROP cells. This means >> that the average overhead of this machine is 11 padding cells. >> >> In terms of WTF-PAD terminology, these machines have three states (START, >> OBF, END). They move from the START to OBF state when the first >> non-padding cell is received on the circuit, and they stay in the OBF >> state until all the padding gets depleted. The OBF state is controlled by >> a histogram which specifies the parameters described in the paragraphs >> above. After all the padding finishes, it moves to END state. >> >> We also set a special WTF-PAD flag which keeps the circuit open even after >> the introduction is performed. In particular, with this feature the >> circuit >> will stay alive for the same durations as normal web circuits before they >> expire (usually 10 minutes). > > I would make sure that the implentation here flags the circuit "Unusable" > after an introduction since if a client just repicks it to introduce again > (let say a second SOCKS connection with a different user/pass), then the intro > point will immediately tear it down rendering this "keep open" feature a bit > pointless :(. > I think this is already the case because we repurpose these "keep-alive" circuits as a separate circuit purpose (CIRCUIT_PURPOSE_C_PADDING), and hence they should not be re-used as intro circuits by the client. I should check again tho. Thanks for the feedback! :) Will send a fresh version of the proposal back to the ML soon! _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev