Re: [tor-dev] Proposal 302: Hiding onion service clients using WTF-PAD

2019-05-27 Thread George Kadianakis
David Goulet  writes:

> On 16 May (14:20:05), George Kadianakis wrote:
>
> Hello!
>
>> 4.1. A dive into general circuit construction sequences [CIRCCONSTRUCTION]
>> 
>>In this section we give an overview of how circuit construction looks like
>>to a network or guard-level adversary. We use this knowledge to make the
>>right padding machines that can make intro and rend circuits look like 
>> these
>>general circuits.
>> 
>>In particular, most general Tor circuits used to surf the web or download
>>directory information, start with the following 6-cell relay cell 
>> sequence (cells
>>surrounded in [brackets] are outgoing, the others are incoming):
>> 
>>  [EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [BEGIN] -> CONNECTED
>> 
>>When this is done, the client has established a 3-hop circuit and also
>>opened a stream to the other end. Usually after this comes a series of 
>> DATA
>>cell that either fetches pages, establishes an SSL connection or fetches
>>directory information:
>> 
>>  [DATA] -> [DATA] -> DATA -> DATA
>> 
>>The above stream of 10 relay cells defines the grand majority of general
>>circuits that come out of Tor browser during our testing, and it's what we
>>are gonna use to make introduction and rednezvous circuits blend in.
>
> Considering "either fetches pages,..." is in the description, I'm confused how
> only 2 data cells is the grand majority?
>
> A simple "wget torproject.org" gives me an index.html of 16KB meaning at least
> 32 DATA cells. Even a directory fetch can't only be 2 data cells... ?
>

Perhaps I should have made it more clear but the pattern:

[DATA] -> [DATA] -> DATA -> DATA -> ...

comes from the SSL handshake that happens in most general circuits. In
particular the first two [DATA] cells are the ClientHello etc. SSL
records that get sent by the client, and then the subsequence DATA cells
are the ServerHello etc. of the server.

>> 5.1. Client-side introduction circuit hiding machines [INTRO_CIRC_HIDING]
>> 
>>These two machines are meant to hide client-side introduction circuits. 
>> The
>>origin-side machine sits on the client and sends padding towards the
>>introduction circuit, whereas the relay-side machine sits on the 
>> middle-hop
>>(second hop of the circuit) and sends padding towards the client. The
>>padding from the origin-side machine terminates at the middle-hop and does
>>not get forwarded to the actual introduction point.
>> 
>>Both of these machines only get activated for introduction circuits, and
>>only after an INTRODUCE1 cell has been sent out.
>> 
>>This means that before the machine gets activated our cell flow looks 
>> like this:
>> 
>> [EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [EXTEND2] -> 
>> EXTENDED2 -> [INTRODUCE1]
>> 
>>Comparing the above with section [CIRCCONSTRUCTION], we see that the above
>>cell sequence matches the one from general circuits up to the first 7 
>> cells.
>> 
>>However, in normal introduction circuits this is followed by an
>>INTRODUCE_ACK and then the circuit gets teared down, which does not match
>>the sequence from [CIRCCONSTRUCTION].
>> 
>>Hence when our machine is used, after sending an [INTRODUCE1] cell, we 
>> also
>>send a [PADDING_NEGOTIATE] cell, which gets answered by a 
>> PADDING_NEGOTIATED
>>cell and an INTRODUCE_ACKED cell. This makes us match the 
>> [CIRCCONSTRUCTION]
>>sequence up to the first 10 cells.
>> 
>>After that, we continue sending padding from the relay-side machine so as 
>> to
>>fake a directory download, or an SSL connection setup. We also want to
>>continue sending padding so that the connection stays up longer to destroy
>>the "Duration of Activity" fingerprint.
>
> I've looked at the implementation quickly and these DROP cells aren't
> accounted for in our circuit flow control which means that there will be a
> difference between a "real" DATA circuit and a circuit being sent PADDING in
> order to look like the former. And that will be the flow control cell(s)
> (SENDME) coming back from the end point that is receiving the data.
>
> In other words, one circuit (the padded one) will have only a long stream of
> cells going in one direction and the second circuit (with legit data) will
> have that long stream but now and then a cell coming back down the circuit.
>
> I believe this is quite the distinguisher between any circuit seeing much
> padding and one that doesn't? :S
>

I think you are right, but I dont think that these padded intro circuits
will stay open for long enough to need a SENDME cell from the client to
the relay. In particular, the client will receive about 15 cells before
the intro circuit gets teared down.

>> 
>>To calculate the padding overhead, we see that the origin-side machine 
>> just
>>sends a single [PADDING_NEGOATIATE] cell, wheras the origin-side machine
>
> Typo here 

Re: [tor-dev] Proposal 302: Hiding onion service clients using WTF-PAD

2019-05-21 Thread David Goulet
On 16 May (14:20:05), George Kadianakis wrote:

Hello!

> 4.1. A dive into general circuit construction sequences [CIRCCONSTRUCTION]
> 
>In this section we give an overview of how circuit construction looks like
>to a network or guard-level adversary. We use this knowledge to make the
>right padding machines that can make intro and rend circuits look like 
> these
>general circuits.
> 
>In particular, most general Tor circuits used to surf the web or download
>directory information, start with the following 6-cell relay cell sequence 
> (cells
>surrounded in [brackets] are outgoing, the others are incoming):
> 
>  [EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [BEGIN] -> CONNECTED
> 
>When this is done, the client has established a 3-hop circuit and also
>opened a stream to the other end. Usually after this comes a series of DATA
>cell that either fetches pages, establishes an SSL connection or fetches
>directory information:
> 
>  [DATA] -> [DATA] -> DATA -> DATA
> 
>The above stream of 10 relay cells defines the grand majority of general
>circuits that come out of Tor browser during our testing, and it's what we
>are gonna use to make introduction and rednezvous circuits blend in.

Considering "either fetches pages,..." is in the description, I'm confused how
only 2 data cells is the grand majority?

A simple "wget torproject.org" gives me an index.html of 16KB meaning at least
32 DATA cells. Even a directory fetch can't only be 2 data cells... ?

Is this that "there will always be a minimum of 2 data cell both ways" and
thus you want to match that for HS client circuits and then send bunch of
padding to match whatever comes next on a general circuit but "at least we'll
have 10 cells like any other circuits" ?

> 5.1. Client-side introduction circuit hiding machines [INTRO_CIRC_HIDING]
> 
>These two machines are meant to hide client-side introduction circuits. The
>origin-side machine sits on the client and sends padding towards the
>introduction circuit, whereas the relay-side machine sits on the middle-hop
>(second hop of the circuit) and sends padding towards the client. The
>padding from the origin-side machine terminates at the middle-hop and does
>not get forwarded to the actual introduction point.
> 
>Both of these machines only get activated for introduction circuits, and
>only after an INTRODUCE1 cell has been sent out.
> 
>This means that before the machine gets activated our cell flow looks like 
> this:
> 
> [EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [EXTEND2] -> 
> EXTENDED2 -> [INTRODUCE1]
> 
>Comparing the above with section [CIRCCONSTRUCTION], we see that the above
>cell sequence matches the one from general circuits up to the first 7 
> cells.
> 
>However, in normal introduction circuits this is followed by an
>INTRODUCE_ACK and then the circuit gets teared down, which does not match
>the sequence from [CIRCCONSTRUCTION].
> 
>Hence when our machine is used, after sending an [INTRODUCE1] cell, we also
>send a [PADDING_NEGOTIATE] cell, which gets answered by a 
> PADDING_NEGOTIATED
>cell and an INTRODUCE_ACKED cell. This makes us match the 
> [CIRCCONSTRUCTION]
>sequence up to the first 10 cells.
> 
>After that, we continue sending padding from the relay-side machine so as 
> to
>fake a directory download, or an SSL connection setup. We also want to
>continue sending padding so that the connection stays up longer to destroy
>the "Duration of Activity" fingerprint.

I've looked at the implementation quickly and these DROP cells aren't
accounted for in our circuit flow control which means that there will be a
difference between a "real" DATA circuit and a circuit being sent PADDING in
order to look like the former. And that will be the flow control cell(s)
(SENDME) coming back from the end point that is receiving the data.

In other words, one circuit (the padded one) will have only a long stream of
cells going in one direction and the second circuit (with legit data) will
have that long stream but now and then a cell coming back down the circuit.

I believe this is quite the distinguisher between any circuit seeing much
padding and one that doesn't? :S

> 
>To calculate the padding overhead, we see that the origin-side machine just
>sends a single [PADDING_NEGOATIATE] cell, wheras the origin-side machine

Typo here "PADDING_NEGOATIATE".

>sends a PADDING_NEGOTIATED cell and between 7 to 10 DROP cells. This means
>that the average overhead of this machine is 11 padding cells.
> 
>In terms of WTF-PAD terminology, these machines have three states (START,
>OBF, END). They move from the START to OBF state when the first
>non-padding cell is received on the circuit, and they stay in the OBF
>state until all the padding gets depleted. The OBF state is controlled by
>a histogram which specifies 

Re: [tor-dev] Proposal 302: Hiding onion service clients using WTF-PAD

2019-05-20 Thread teor

> On 21 May 2019, at 00:35, George Kadianakis  wrote:
> 
> Tom Ritter  writes:
> 
>>> On Thu, 16 May 2019 at 11:20, George Kadianakis  
>>> wrote:
>>>3) Duration of Activity ("DoA")
>>> 
>>>  The USENIX paper uses the period of time during which circuits send and
>>>  receive cells to distinguish circuit types. For example, client-side
>>>  introduction circuits are really short lived, wheras service-side
>>>  introduction circuits are very long lived. OTOH, rendezvous circuits 
>>> have
>>>  the same median lifetime as general Tor circuits which is 10 minutes.
>>> 
>>>  We use WTF-PAD to destroy this feature of client-side introduction
>>>  circuits by setting a special WTF-PAD option, which keeps the circuits
>>>  open for 10 minutes completely mimicking the DoA of general Tor 
>>> circuits.
>> 
>> 10 minutes exactly; or a median of 10 minutes?  Wouldn't 10 minutes
>> exactly be a near-perfect distinguisher? And if it's a median of 10
>> minutes, do we know if it follows a normal distribution/what is the
>> shape of the distribution to mimic?
>> 
> 
> Oops, you are right, Tom.
> 
> It's not 10 minutes exactly. The right thing to say is that it's a median
> of 10 minutes, altho I'm not entirely sure of the exact distribution.
> 
> These circuits basically now follow the MaxCircuitDirtiness
> configuration like general circuits, and it gets orchestrated by
> circuit_expire_old_circuits_clientside(). Not sure if it's in a spec
> somewhere.
> 
> I will update the spec soon with the fix. Thanks!

If I understand correctly, Tor's circuits close about 10 minutes after
the last time they handled traffic.

So that's a *minimum* of 10 minutes. And probably a *median* of
slightly more than 10 minutes, if the user is web browsing.

T
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Proposal 302: Hiding onion service clients using WTF-PAD

2019-05-20 Thread George Kadianakis
Tom Ritter  writes:

> On Thu, 16 May 2019 at 11:20, George Kadianakis  wrote:
>> 3) Duration of Activity ("DoA")
>>
>>   The USENIX paper uses the period of time during which circuits send and
>>   receive cells to distinguish circuit types. For example, client-side
>>   introduction circuits are really short lived, wheras service-side
>>   introduction circuits are very long lived. OTOH, rendezvous circuits 
>> have
>>   the same median lifetime as general Tor circuits which is 10 minutes.
>>
>>   We use WTF-PAD to destroy this feature of client-side introduction
>>   circuits by setting a special WTF-PAD option, which keeps the circuits
>>   open for 10 minutes completely mimicking the DoA of general Tor 
>> circuits.
>
> 10 minutes exactly; or a median of 10 minutes?  Wouldn't 10 minutes
> exactly be a near-perfect distinguisher? And if it's a median of 10
> minutes, do we know if it follows a normal distribution/what is the
> shape of the distribution to mimic?
>

Oops, you are right, Tom.

It's not 10 minutes exactly. The right thing to say is that it's a median
of 10 minutes, altho I'm not entirely sure of the exact distribution.

These circuits basically now follow the MaxCircuitDirtiness
configuration like general circuits, and it gets orchestrated by
circuit_expire_old_circuits_clientside(). Not sure if it's in a spec
somewhere.

I will update the spec soon with the fix. Thanks!
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Proposal 302: Hiding onion service clients using WTF-PAD

2019-05-16 Thread Tom Ritter
On Thu, 16 May 2019 at 11:20, George Kadianakis  wrote:
> 3) Duration of Activity ("DoA")
>
>   The USENIX paper uses the period of time during which circuits send and
>   receive cells to distinguish circuit types. For example, client-side
>   introduction circuits are really short lived, wheras service-side
>   introduction circuits are very long lived. OTOH, rendezvous circuits 
> have
>   the same median lifetime as general Tor circuits which is 10 minutes.
>
>   We use WTF-PAD to destroy this feature of client-side introduction
>   circuits by setting a special WTF-PAD option, which keeps the circuits
>   open for 10 minutes completely mimicking the DoA of general Tor 
> circuits.

10 minutes exactly; or a median of 10 minutes?  Wouldn't 10 minutes
exactly be a near-perfect distinguisher? And if it's a median of 10
minutes, do we know if it follows a normal distribution/what is the
shape of the distribution to mimic?

-tom
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev