Re: Web Traffic Analysis

Gabriel Belingueres Wed, 8 Sep 1999 04:06:53 -0700

Here I send to you a draft of the protocol, but there are a lot of work
to do yet.
Numbers and lengths are drafts too.

Gabriel.

Ben Laurie wrote:
> 
> Gabriel Belingueres wrote:
> >
> > Hi,
> >
> > Talking in the sci.crypt newsgroup, I did have an
> > idea about how to do the Web more secure against traffic analysis. The
> > idea come from a paper I been reading ("Analysis of the SSL 3.0
> > protocol" by B. Schneier and D. Wagner). They describe how an attacker
> > can guess the pages you have been accessed by looking the lengths of the
> > SSL messages exchanged in the HTTPS's requests and replys.
> > The idea I was thinking is to add a tiny protocol between HTTP and SSL,
> > to break the 1-to-1 mapping between HTTP and SSL messages. The mapping
> > now would be in a random way.
> 
> ?? How?
> 
> > Could anybody give me your impressions about that idea?
> > Should I continue further designing the protocol, or you think that
> > nobody cares about web traffic analysis?
> 
> It is interesting, but I don't see how you propose to defeat it.
> 
> Cheers,
> 
> Ben.
> 
> --
> http://www.apache-ssl.org/ben.html
> 
> "My grandfather once told me that there are two kinds of people: those
> who work and those who take the credit. He told me to try to be in the
> first group; there was less competition there."
>      - Indira Gandhi
> ______________________________________________________________________
> OpenSSL Project                                 http://www.openssl.org
> Development Mailing List                       [EMAIL PROTECTED]
> Automated List Manager                           [EMAIL PROTECTED]

-- 
Gabriel Belingueres

Providing protection against traffic analysis on the Web
========================================================

The basic problem is that an attacker can guess the page the user is viewing because 
the interaction with the server and the lengths of the messages are known to him/her.

The idea behind this protocol is to break the one-to-one mapping between HTTP messages 
and SSL connections.

As noted in [CHENG], the "pattern" of messages of the HTTP protocol is that:

1) First, the client perform a request of a web page (GET /index.html...)
2) The server answer to the client, providing him/her with the HTML file, using the 
same socket.
3) The client process the HTML file and issues the corresponding requests of the 
objects that the HTML file contains. The client establishes ONE SSL connection per 
HTML GET request.
4) The server answer the requests concurrently.

The attacker is based in both this "interaction pattern" and the message lengths to do 
your attack.

In the face of that, is very important that one of the basic requeriments of the 
protocol is to provide the SAME interaction pattern as the HTTPS protocol normally do, 
together with the length "masking" of the transmitted messages.

What the protocol will do is basicly receive the HTTP messages (from the upper layer), 
breaking each in a pseudo-random quantity of fragments. Then open a pseudo-random 
quantity of SSL connections and send all the fragments through a pseudo-random chosen 
connection.
Of course, that pseudo-random numbers generated by the server has to be regenerated by 
(or communicated to) the client in order to read from the appropiate SSL connection, 
and reordering and reassembling of the fragments to obtain the transmitted files. Once 
done that, the protocol in the client will pass the files to the upper layer (HTTP) to 
show in the browser.

The layering of the protocol would be something like this:

+------------------+
|       HTTPS      |
+------------------+
|   This protocol  |
+------------------+
|       SSL        |
+------------------+
|     TCP/IP       |
+------------------+


How it works
============

Bellow there is an interaction diagram in a typical HTTPS connection, using the 
protocol decrypted here. The example shows a HTML page that contains 9 images.


HTTPS client                                         HTTPS server

1)
----------------- SSL's ClientHello + [Code] ----------------->
<-------------------- SSL's ServerHello -----------------------
    ...Completion of the SSL Handshake, resulting in a...
<---------------- SSL connection established ----------------->

2)
-------- GET /index.html HTTP 1.1 Host: www.ibm.com ---------->
<------------------------- HTML file --------------------------
<------------------------- HTML file --------------------------
                              ...
<--------------- HTML file + [ACK] + [Padding] ----------------

3)
c1 ---- GET /file9.gif ; GET /file5.gif ; GET /file1.gif ----->
c2 --------------------- GET /file6.gif ; GET /file2.gif ----->
c3 ------------------------ DECOY ---------------------------->
                              ...
cN --------------------- GET /file8.gif ; GET /file4.gif ----->

4)
c1 <------ file1.gif/h; file9.gif/h;  file8.gif/f1; ... -------
c2 <------ file2.gif/h; file1.gif/f1; file9.gif/f1; ... -------
c3 <------ file3.gif/h; file2.gif/f1; file1.gif/f2; ... -------
                              ...
cN <------ file8.gif/h; file7.gif/f1; file6.gif/f2; ... -------


Step 1
------

The user clicks in a link with the URL https://www.ibm.com/index.html.
The browser has to set a SSL connection in order to send the HTTP's GET request, so it 
initiates the SSL Handshake, as always.
But this time, at the end of the SSL's CLientHello message, the browser writes a 
"code" that tells the server it want the request be served using this protocol.

Writing extra data after the compression methods list in the ClientHello message is 
legal [TLS], and it is included in the SSL's handshake hashes, so any modification of 
ClientHello is detectable by the communicating parties at the end of the handshake.

This code in the ClientHello message is ignored by SSL 3.0 y TLS 1.0 as we know them 
now. In this way, this protocol is interoperable with existing Web servers and web 
browsers. The only party that can send he code is the client, so if the code is sent, 
the SSL layer of the web server will ignore it and will continue normally.

Although the client send the code in the clear, it doesn't means that the HTTP 
interaction will be served using this protocol, because the server confirmation will 
be sent in a secure way, as described in the Step 2.

The SSL or TLS protocol has to be modified to recognize the extra data as "data for 
the upper layer protocol". Technically, this is not  a "layer invasion" since the data 
is not interpreted in any way by SSL or TLS.

However, the extra data in the ClientHello message is intended for "backward 
compatibility" with future versions of TLS, not for provide with extra data to upper 
layers. I propose to add an "Escape" code for this purpose. This Escape code has to be 
standardized in the RFC of TLS in order to be useful in other contexts. In the same 
way, the data that follows the Escape code must be standardized too.

It is convenient that the SSL or TLS version number (3.0 and 3.1) doesn't change, 
because if it does, it would be revealing that the server is prepared to accept this 
protocol, because it would show the number in the ServerHello message, witch can 
travels in the clear. We don't want to reveal this information.

I propose that the extra data provided by SSL stay first, then the Escape code and the 
data for the upper layer protocol, in other words, the ClientHello message would have 
the following data (see [TLS] to see the message structure):

client_version + random + session_id + cipher_suites + compression_methods + 
"tls_data" + ESC + "upper_layer_data"

Web server implementations that not provides any kind of added functionality using an 
intermediate protocol like this, MUST NOT set that the implementation of SSL it is 
using forward to the upper layer the extra data after the Escape code.

When the SSL handshake is done, the SSL connection has been established.

Step 2
------

Now the web browser issues a GET request for a html file, in the usual way.
When the web server receives the request, it returns the html file as always. But 
after the EOF, this protocol appends a ACK message (witch one?), that means that the 
web server supports this protocol and will send the data requested by the client in a 
special way, as described later in the Step 4.
In the client side, when it receives the reply from the server, it scans after the EOF 
if came something extra. If yes, the client knows that the server will send the HTML 
data in a way that in difficult to guess for a traffic analysis attack (I hope!).
The ACK message is transmitted in a secure way. This means that the SSL CipherSuite 
agreed between the client and server is encrypted, i.e. the CipherSuite agreed MUST 
NOT be TLS_RSA_NULL_WITH_MD5 or TLS_RSA_NULL_WITH_SHA.
If nothing comes after the EOF of the HTML file before the SSL connection  closes, 
then it means that the server doesn't support this protocol (or don't want to provide 
us the service). 

The client then set how many SSL connections the has to open, (cN in the figure). The 
client chooses this random number following a uniform probability distribution (UPD). 
The client could have a Lower Limit (1? or more?) and a Upper Limit for this number. 
The Upper Limit has to be chosen in such a way that does not be suspicious for the 
attacker (witch number?).

After the ACK comes a random padding (how long?) following a UPD, to provide a 
"masking" of what web page has been accessed (in addition of that of SSL). I seems to 
be the only choice, providing we have to respect the interaction pattern of the 
HTTPS's GET reply.

Step 3
------

Having set the cN connections, the client sends your requests for the HTML page's 
embedded objects.

The client wait for the web browser to send to it all the GETs it wants. Once done 
that, the protocol select at random some "dummy" connections (DECOY) (Hoy many is 
good?). This selection is done with a random number between [1, cN] using a UPD.

The protocol concatenate the GETs into the non-dummy connections, taking a "circular" 
strategy.
The dummy connections are treated separately. They transport a random sequence of 
values using a UPD with length in the range [70,140] (using a UPD) (numbers need 
tests).
In ALL the cases, at the end of the all data in each one of the connection is added a 
random padding (using a UPD).

Step 4
------

The server receives the GETs from the cN connections opened from the client. While is 
receiving data from the connections, the protocol could parse a message from the 
client and obtaining the GETs to throw to the upper layer.

Then, the web server will send to the protocol the header of each of the objects 
requested by the client in a "circular" strategy.

Each of the data objects are broken in cN fragments of random length. If the quantity 
of fragments is lower than cN, then a dummy, random length fragment is generated until 
complete the cN fragments.

Those fragments are sent "circularly" starting in the next connection (the one next to 
the one with the last header).

If the quantity of connections (cN) is bigger than the total number of fragments, then 
now a "dummy fragment" is sent in the rest of the connections.

In both cases, at the end of the last fragment of each of the connections, a random 
padding is added, as in Step 3.

This is done until the end of the object's data is sent.

Structures
==========

The data structures are not defined yet. First I want to know if the overall protocol 
and security provided is good enough.

Security considerations
=======================

First of all, this protocol is completely based in the security provided by SSL. 
Although, it recommends some changes to it, this protocol could work anyway without 
those changes.

Turning on the web browser cache helps mitigate web traffic analysis as said in 
[CHENG], in addition of save network bandwidth and downloading time.
I recommend turning on the cache when somebody want to protect your web surfing 
against traffic analysis, but at least the first time the HTML page and their embedded 
objects has to be downloaded. Furthermore, after a few days, the cache will expire, 
and the pages will be erased, so an attacker with pacience will carry out a successful 
attack sooner or later.

Providing TLS with a random padding for the CipherSuites using a stream cipher is 
required, as stated in [WAGNER] and [CHENG].
(Somebody told me one time that you can use block ciphers whenever you want, but that 
isn't the answer to the problems.)
Since this protocol is independent of both the (upper) application protocol and the 
(lower) secure transport protocol, the secure transport protocol MUST warrant that the 
data delivered is padded in the same way,  independently of witch one was the 
CipherSuite chosen.

The information that a web server from a given company supports this protocol may be 
difficult to maintain secret, because of marketing affairs (the company want to 
differentiate itself from its competitors, adding a value-added service) or because of 
disloyal employees.
Only the information that a given request be served by this protocol it keep it secret 
(because it travels encrypted).

The CipherSuites that only authenticates messages, but not encrypt it, such as 
TLS_RSA_NULL_WITH_MD5 or TLS_RSA_NULL_WITH_SHA MUST NOT be allowed by the web service 
providing this service, because if it allowed, and attacker can set one of those 
CipherSuites as your preferred choice, and then foil all this protocol.
The SSL protocol must be configured not to accept this kind or ciphersuites.

The Code to request the server to use this protocol will probably sent in the clear, 
but that only means that the client want to use it. The ACK

The minimum length GET request is something like that:
GET /A HTTP/1.0
Connection: Keep-Alive
User-Agent: Mozilla/4.02 [en] (WinNT; U)
Host: a.ar:1999
Accept: */*
Accept-Languaje: en
Accept-Charset: *
Witch is about 140 bytes, I think that a random padding between [0,128] bytes is 
enough.
The DECOY (dummy message) must be at more or less between [140,1024] bytes (140 
minimum because of the minimum GET request).
The dummy fragment (at Step 4) length must be at more or less between [1,1024] bytes.
(This numbers are not definitive, still need empirical tests).

Advantages
==========

1) Interoperable with existing web servers and browsers.

2) The use of this protocol does not implies the modification of neither the 
specification nor implementation of the HTTPS protocol.  The API in which HTTPS is 
based is the same one that that of SSL (I think).

3) Because of the data of the upper layer is not interpreted in any way, the protocol 
can be used with any other application protocol, such as SSMTP, SPOP3, etc., with the 
only condition that the application protocol make a SSL protected client request and a 
SSL protected server reply as a minimum.

4) The protocol is "stateless". It means that it have not to save information 
regarding any prior of future HTTPS's request and reply.

Disadvantages
=============

1) The protocol consumes extra memory, because it has to retain the objects received 
while the fragments received don't be in order.

2) The SSL implementation that support this protocol has to be changed for providing 
to this protocol with the extra data carried by the SSL's ClientHello message.

3) It don't provide anonymity of the parties, just the impossibility of infer which 
was the accessed file.

4) Another layer between HTTPS and the Sockets API, witch adds some overhead.

References
==========

[CHENG] Cheng and Avnur, "Traffic Analysis of SSL encrypted browsing".
[TLS] Dierks and Allen, RFC 2246, "TLS protocol v1.0".
[WAGNER] Wagner and Schneier, "Analysis of the SSL 3.0 protocol".

Author
======

Gabriel Belingueres
[EMAIL PROTECTED]

Re: Web Traffic Analysis

Reply via email to