Re: Tor seems to have a huge security risk--please prove me wrong!

2010-08-28 Thread Paul Syverson
On Sat, Aug 28, 2010 at 02:51:35PM -0400, Roger Dingledine wrote:
> On Sat, Aug 28, 2010 at 11:20:41AM -0400, Paul Syverson wrote:
> > What you describe is known in the literature as website fingerprinting
> > attacks,
> [snip]
> > Roughly, while Tor is not invulnerable to such an attack, it fairs
> > pretty well, much better than other systems that this and earlier
> > papers examined mostly because the uniform size cells that Tor moves
> > all data with adds lots of noise.
> 
> Maybe. Or maybe not. This is an open research area that continues to
> worry me.
> 
> I keep talking to professors and grad students who have started a paper
> showing that website fingerprinting works on Tor, and after a while they
> stop working on the paper because they can't get good results either way
> (they can't show that it works well, and they also can't show that it
> doesn't work well).
> 
> The real question I want to see answered is not "does it work" -- I bet
> it can work in some narrow situations even if it doesn't work well in
> the general case. Rather, I want to know how to make it work less well.
> But we need to have a better handle on how well it works before we can
> answer that harder question.

OK I'm confused. Sorry for being terse initially but I just wanted to
get out that website fingerprinting is a known problem not a new
surprise. But it sounds like you think you are contrasting with what I
said rather than extending the same points. I said Tor is not
invulnerable to the attack, only that the published research (I wasn't
talking about the abandoned projects) shows it's a lot less vulnerable
than other deployed systems examined in that research, like jondonym
or various VPNs.  Yes, of course that's subject to the experiments and
assumptions conducted so far. I also said that it's worthy of
continued examination and analysis even if it is not the demonstrated
problem for Tor that end-to-end correlation is.  Since it's a pretty
open research area, we cannot say some significant attack isn't around
the corner. That's always the case.  All we know yet is that the few
published results there are show a small fraction of websites seem to
be uniquely identifiable via existing techniques. What am I missing?

> 
> For those who want more background, you can read more at item #1 on
> https://www.torproject.org/research.html.en#Ideas
> (I hoped to transition
> https://www.torproject.org/volunteer.html.en#Research over to that new
> page, but haven't gotten around to finishing)

Yes. Exploring defensive techniques would be good. Unlike correlation,
fingerprinting seems more likely to be amenable to traffic shaping;
although the study of this for countering correlation (as some of us
recently published at PETS ;>) may be an OK place to build on.
Personally I still think trust is going to play a bigger role as an
effective counter than general shaping, but one place we seem to be in
sync is that it all needs more study.

aloha,
Paul
***
To unsubscribe, send an e-mail to majord...@torproject.org with
unsubscribe or-talkin the body. http://archives.seul.org/or/talk/


Re: Tor seems to have a huge security risk--please prove me wrong!

2010-08-28 Thread Mike Perry
Thus spake Roger Dingledine (a...@mit.edu):

> On Sat, Aug 28, 2010 at 11:20:41AM -0400, Paul Syverson wrote:
>
> I keep talking to professors and grad students who have started a paper
> showing that website fingerprinting works on Tor, and after a while they
> stop working on the paper because they can't get good results either way
> (they can't show that it works well, and they also can't show that it
> doesn't work well).
> 
> The real question I want to see answered is not "does it work" -- I bet
> it can work in some narrow situations even if it doesn't work well in
> the general case. Rather, I want to know how to make it work less well.
> But we need to have a better handle on how well it works before we can
> answer that harder question.

Yes. This is the approach we need to solve this problem. However, one
of the problems with getting it out of most academics is the bias
against easy reproducibility. In order for any of this research to be
usable by us, it must be immediately and easily verifiable and
reproducible in the face of both changing attacks, and changing
network protocols (such as UDP-Tor and SPDY). This means source code
and experimental logs and data.

Most computer science academia is inherently biased against providing
this data for various reasons, and while this works for large industry
with the budget and time to reproduce experiments without assistance,
it will not work for us. I believe it is the main reason we see
adoption lag of 5-10 years for typical research all over
computer-related academia. My guess is Tor not have this much time to
fix these problems, hence we must demand better science from 
researchers who claim to be solving Tor-related problems (or proving
attacks on Tor networks).

I've gone into a little more detail on this subject and the
shortcomings of timing attacks in general in my comments on Michal
Zalewski's blog about regular, non-Tor HTTPS timing attacks:
http://lcamtuf.blogspot.com/2010/06/https-is-not-very-good-privacy-tool.html#comment-form


-- 
Mike Perry
Mad Computer Scientist
fscked.org evil labs


pgpqV53J20YXO.pgp
Description: PGP signature


Re: Tor seems to have a huge security risk--please prove me wrong!

2010-08-28 Thread Roger Dingledine
On Sat, Aug 28, 2010 at 11:20:41AM -0400, Paul Syverson wrote:
> What you describe is known in the literature as website fingerprinting
> attacks,
[snip]
> Roughly, while Tor is not invulnerable to such an attack, it fairs
> pretty well, much better than other systems that this and earlier
> papers examined mostly because the uniform size cells that Tor moves
> all data with adds lots of noise.

Maybe. Or maybe not. This is an open research area that continues to
worry me.

I keep talking to professors and grad students who have started a paper
showing that website fingerprinting works on Tor, and after a while they
stop working on the paper because they can't get good results either way
(they can't show that it works well, and they also can't show that it
doesn't work well).

The real question I want to see answered is not "does it work" -- I bet
it can work in some narrow situations even if it doesn't work well in
the general case. Rather, I want to know how to make it work less well.
But we need to have a better handle on how well it works before we can
answer that harder question.

For those who want more background, you can read more at item #1 on
https://www.torproject.org/research.html.en#Ideas
(I hoped to transition
https://www.torproject.org/volunteer.html.en#Research over to that new
page, but haven't gotten around to finishing)

or see my 25c3 talk from 2008:
http://events.ccc.de/congress/2008/Fahrplan/events/2977.en.html
http://media.torproject.org/video/25c3-2977-en-security_and_anonymity_vulnerabilities_in_tor.mp4

--Roger

***
To unsubscribe, send an e-mail to majord...@torproject.org with
unsubscribe or-talkin the body. http://archives.seul.org/or/talk/


Re: Tor seems to have a huge security risk--please prove me wrong!

2010-08-28 Thread Paul Syverson
Hi Hikki,

What you describe is known in the literature as website fingerprinting
attacks, and there have been several research papers published about
them. Consult freehaven.net/anonbib or type "website fingerprinting"
in your favorite search engine. I think the most recent paper on this
is "Website fingerprinting: attacking popular privacy enhancing
technologies with the multinomial na??ve-bayes classifie" by Herman
et al.  at the 2009 ACM CCSW (Cloud Computing Security Workshop). It
will cite much of the relevant previous literature.

Roughly, while Tor is not invulnerable to such an attack, it fairs
pretty well, much better than other systems that this and earlier
papers examined mostly because the uniform size cells that Tor moves
all data with adds lots of noise.

The ability to identify destinations without seeing the destination
end of a connection (even with pretty low probability of typical
success) remains worthy of continued examination and analysis.
But end-to-end correlation remains the most significant
fact-of-life for all practical low-latency systems, including Tor.

aloha,
Paul

On Sat, Aug 28, 2010 at 06:51:13AM -0400, hi...@safe-mail.net wrote:
> There are a lot of discussions going on over at the Onion Forum, a Tor hidden 
> service board, regarding a possible attack on the Tor's anonymity and safety. 
> It's called "classifier attacks" and seems to be a high probability attack 
> that may in a way unmask the encryption used by Tor, and in addition to that 
> reveal the source as in the user using Tor as the first part of the chain.
> 
> This subject seems to be either very unknown or very well silenced. So 
> therefore I'm very interesting about what the users of this mailing list have 
> to say about this.
> 
> --
> 
> http://l6nvqsqivhrunqvs.onion/index.php?do=topic&id=12078
> 
> Here are two concerning posts:
> 
> -- QUOTE START --
> 
> It's really not that hard to understand the attack I don't see why everyone 
> is having such a hard time to get it.
> 
> You encrypt X with a key and the output is Y. There are 2^256 possible Y 
> values, with a 256 bit Initialization vector. This means each time you 
> encrypt X, even with the same key, the resulting Y is a different bit string. 
> The Bit string of X becomes impossible to get unless you have the key and Y. 
> So, the encrypted information itself can not be fingerprinted because there 
> are 2^256 possible ciphertexts for a given plaintext/key.
> 
> However, the SIZE that X will be after encrypted can be determined. X always 
> produces a Y of the same size when encrypted with a given key length, even 
> though there are 2^256 possible ciphertexts there is ONE possible size for Y.
> 
> This by itself isn't that bad for small data. Cat and Dog produce the same 
> output size for the same key. Once you start getting into really big things, 
> like motion pictures etc, then it starts to be a lot more damaging because 
> there are not a whole lot of things that are 329,384,394,231 bits, and by 
> looking at the Y value you can tell how many bits the X value was if you know 
> the algorithm used. Classifier attacks work better with SIZE.
> 
> However, complexity is another issue. If there is a website with 25 small 
> images on it, then the adversary can see the size of all these different 
> encrypted images you are loading. Each image can be seen by the adversary as 
> a different object, and the size of these objects can be determined. Also, if 
> you follow links on a page that you vist, the adversary can see the same data 
> for each of these pages and become more and more certain of what you are 
> doing. Classifier attacks work better with COMPLEXITY.
> 
> If you encrypt LARGE data, or COMPLEX SETS of data, it does not matter if you 
> use AES-256the bitstring of X can not be derived with Y with out the key, 
> but enough characteristics of X stay in Y that the adversary can with high 
> probability say what Y would PROBABLY decrypt into if they had the key. This 
> does require the adversary to have SEEN the value of X at some point prior to 
> it being encrypted, but this is not really that hard now is it? Tor is used 
> to PROTECT YOU incase there IS an insider in your groupbut an insider in 
> your group can fingerprint X regardless of if it is CP, a drug forum or a 
> secret military document.
> 
> Understand?
> 
> -- QUOTE END --
> 
> -- QUOTE START --
> 
> Oh yeah, it can be done with layers too so its not just the entry node / 
> infrastructure to worry about, although that is the biggest worry since you 
> are next in the chain.
> 
> X -> Y
> Y -> Z
> Z -> U
> 
> U can be used to determine the size of Z, Z can be used to determine the size 
> of Y, Y can be used to determine the size of X.
> 
> Layer encrypted data can still be classified, its just the relay node isn't 
> looking for the fingerprint of X it is looking for the fingerprint of Y which 
> it can get with Z.
> 
> -- QUOTE END --
> ***

Tor seems to have a huge security risk--please prove me wrong!

2010-08-28 Thread hikki
There are a lot of discussions going on over at the Onion Forum, a Tor hidden 
service board, regarding a possible attack on the Tor's anonymity and safety. 
It's called "classifier attacks" and seems to be a high probability attack that 
may in a way unmask the encryption used by Tor, and in addition to that reveal 
the source as in the user using Tor as the first part of the chain.

This subject seems to be either very unknown or very well silenced. So 
therefore I'm very interesting about what the users of this mailing list have 
to say about this.

--

http://l6nvqsqivhrunqvs.onion/index.php?do=topic&id=12078

Here are two concerning posts:

-- QUOTE START --

It's really not that hard to understand the attack I don't see why everyone is 
having such a hard time to get it.

You encrypt X with a key and the output is Y. There are 2^256 possible Y 
values, with a 256 bit Initialization vector. This means each time you encrypt 
X, even with the same key, the resulting Y is a different bit string. The Bit 
string of X becomes impossible to get unless you have the key and Y. So, the 
encrypted information itself can not be fingerprinted because there are 2^256 
possible ciphertexts for a given plaintext/key.

However, the SIZE that X will be after encrypted can be determined. X always 
produces a Y of the same size when encrypted with a given key length, even 
though there are 2^256 possible ciphertexts there is ONE possible size for Y.

This by itself isn't that bad for small data. Cat and Dog produce the same 
output size for the same key. Once you start getting into really big things, 
like motion pictures etc, then it starts to be a lot more damaging because 
there are not a whole lot of things that are 329,384,394,231 bits, and by 
looking at the Y value you can tell how many bits the X value was if you know 
the algorithm used. Classifier attacks work better with SIZE.

However, complexity is another issue. If there is a website with 25 small 
images on it, then the adversary can see the size of all these different 
encrypted images you are loading. Each image can be seen by the adversary as a 
different object, and the size of these objects can be determined. Also, if you 
follow links on a page that you vist, the adversary can see the same data for 
each of these pages and become more and more certain of what you are doing. 
Classifier attacks work better with COMPLEXITY.

If you encrypt LARGE data, or COMPLEX SETS of data, it does not matter if you 
use AES-256the bitstring of X can not be derived with Y with out the key, 
but enough characteristics of X stay in Y that the adversary can with high 
probability say what Y would PROBABLY decrypt into if they had the key. This 
does require the adversary to have SEEN the value of X at some point prior to 
it being encrypted, but this is not really that hard now is it? Tor is used to 
PROTECT YOU incase there IS an insider in your groupbut an insider in your 
group can fingerprint X regardless of if it is CP, a drug forum or a secret 
military document.

Understand?

-- QUOTE END --

-- QUOTE START --

Oh yeah, it can be done with layers too so its not just the entry node / 
infrastructure to worry about, although that is the biggest worry since you are 
next in the chain.

X -> Y
Y -> Z
Z -> U

U can be used to determine the size of Z, Z can be used to determine the size 
of Y, Y can be used to determine the size of X.

Layer encrypted data can still be classified, its just the relay node isn't 
looking for the fingerprint of X it is looking for the fingerprint of Y which 
it can get with Z.

-- QUOTE END --
***
To unsubscribe, send an e-mail to majord...@torproject.org with
unsubscribe or-talkin the body. http://archives.seul.org/or/talk/