Re: RFC: Established connections hash function

2007-03-29 Thread Evgeniy Polyakov
On Wed, Mar 28, 2007 at 04:52:55PM +0200, Andi Kleen ([EMAIL PROTECTED]) wrote:
  3) We dont want to be 'totally secure'. We only want to raise the level, 
  and eventually see if we have to spend more time on this next year(s). 
  AFAIK we had two different reports from people being hit by the flaw of 
  previous hash. Not really a critical issue.
 
 Yes, but you probably want a complexity of at least 10^5-10^6 to be any
 useful. I don't think you will get that early in boot from random
 unless you use hardware support.

What we have (had) right now is broken situation, and it must be fixed
no matter what solution is used. Using more secure hash (and breaking
Jenkins hash is a bit harder than XOR one, I say it not only from
theoretical point of view looking into its operations) is a fix.
It is possible that there is even better fix - there is always something
better than one has right now, but right now problem must be fixed.
And David (and Eric, and Nikolaos) fixed that problem in place.
It can be solved (this time it will be called 'improved') further.

  
  4) We could add a hard limit on the length of one chain. Even if the bad 
  guys discover a flaw, it wont hurt too much.

Hard limit should not be used, since it is exactly what attacker wants -
attacker can get all slots in th hash table and server will not respond
to other clients at all, although it could, but much slower.

 Or just use the trie?  It has other advantages too :)

As an interested party I should not comment, but I can not resist - 
yes, it is cool and can be done better :)

 -Andi

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-28 Thread Andi Kleen

 
  But I think it can be mostly ignored.
 
 With all due respect, it cannot. An attacker with a small-sized botnet 
 (which is ~250 hosts) can create chains that contain well in excess of 3000 
 items. 

Most likely they can also easily generate enough latency data to crack any 
simple
hash function then.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-28 Thread Evgeniy Polyakov
On Wed, Mar 28, 2007 at 11:29:43AM +0200, Andi Kleen ([EMAIL PROTECTED]) wrote:
   But I think it can be mostly ignored.
  
  With all due respect, it cannot. An attacker with a small-sized botnet 
  (which is ~250 hosts) can create chains that contain well in excess of 3000 
  items. 
 
 Most likely they can also easily generate enough latency data to crack any 
 simple
 hash function then.

Jenkins hash is far from being simple to crack, although with some
knowledge it can be done faster.

SHA or something else is essentially the same, except it has different
set of rounds - we only hashes 3 32 bit values, so Jenkins result is
really good.

For the hash tables it is a good solution, but we can move further.
I created multidimensional trie with that problem in mind, but it looks
like right now it is not absolutely required solution - I will continue
to work on it to check if we can be faster than properly sized hash
table with additional trie allocation overhead, but likely I should not
force people include it as is, only for information I think.

 -Andi

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-28 Thread Andi Kleen
Evgeniy Polyakov [EMAIL PROTECTED] writes:
 
 Jenkins hash is far from being simple to crack, although with some
 knowledge it can be done faster.

TCP tends to be initialized early before there is anything
good in the entropy pool.

static void init_std_data(struct entropy_store *r)
{
struct timeval tv;
unsigned long flags;

spin_lock_irqsave(r-lock, flags);
r-entropy_count = 0;
spin_unlock_irqrestore(r-lock, flags);

do_gettimeofday(tv);
add_entropy_words(r, (__u32 *)tv, sizeof(tv)/4);
add_entropy_words(r, (__u32 *)utsname(),
  sizeof(*(utsname()))/4);
}

utsname is useless here because it runs before user space has 
a chance to set it. The only truly variable thing is the 
boot time, which can be guessed with the ns part being brute forced.

To make it secure you would need to do regular rehash like
the routing cache which would pick up true randomness on the first
rehash.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function II

2007-03-28 Thread Andi Kleen
Andi Kleen [EMAIL PROTECTED] writes:

 The only truly variable thing is the 
 boot time, which can be guessed 

Actually you don't need to guess it. It's in any TCP timestamp.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-28 Thread Eric Dumazet
On 28 Mar 2007 16:14:17 +0200
Andi Kleen [EMAIL PROTECTED] wrote:
 TCP tends to be initialized early before there is anything
 good in the entropy pool.
 
 static void init_std_data(struct entropy_store *r)
 {
 struct timeval tv;
 unsigned long flags;
 
 spin_lock_irqsave(r-lock, flags);
 r-entropy_count = 0;
 spin_unlock_irqrestore(r-lock, flags);
 
 do_gettimeofday(tv);
 add_entropy_words(r, (__u32 *)tv, sizeof(tv)/4);
 add_entropy_words(r, (__u32 *)utsname(),
   sizeof(*(utsname()))/4);
 }
 
 utsname is useless here because it runs before user space has 
 a chance to set it. The only truly variable thing is the 
 boot time, which can be guessed with the ns part being brute forced.
 
 To make it secure you would need to do regular rehash like
 the routing cache which would pick up true randomness on the first
 rehash.

Good point, but :

1) We can now use struct timespec to get more bits in init_std_data()

2) tcp ehash salt is initialized at first socket creation, not boot time. Maybe 
we have more available entropy at this point.

3) We dont want to be 'totally secure'. We only want to raise the level, and 
eventually see if we have to spend more time on this next year(s). AFAIK we had 
two different reports from people being hit by the flaw of previous hash. Not 
really a critical issue.

4) We could add a hard limit on the length of one chain. Even if the bad guys 
discover a flaw, it wont hurt too much.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-28 Thread Andi Kleen
On Wed, Mar 28, 2007 at 03:50:47PM +0200, Eric Dumazet wrote:
 1) We can now use struct timespec to get more bits in init_std_data()

That would be a good change, but i don't think it would help that much.
If you know the hardware (e.g. webhost farms tend to have quite
predictive kit) and the kernel binary and the boot offset from
the timestamp you can probably guess the range of ns pretty closely
(let's say down to a few ms). With that it's a small range to search. 

 2) tcp ehash salt is initialized at first socket creation, not boot time. 
 Maybe we have more available entropy at this point.

Sockets are created early too. It would be a little better, but probably
not much. 

The only true random seed is disk, keyboard/mouse and previous state. previous
state is typically a relatively late init script, probably after the first
socket.  Servers tend to have no disk/mouse activity.

Disk may work if you manage to put it after the root mount, but you
lose on diskless systems. e.g. if the nfsd is built in that wouldn't
work though because it would create sockets before that.

Getting entropy from network interrupts would avoid the the diskless
issue, but people are paranoid about that.


 3) We dont want to be 'totally secure'. We only want to raise the level, and 
 eventually see if we have to spend more time on this next year(s). AFAIK we 
 had two different reports from people being hit by the flaw of previous hash. 
 Not really a critical issue.

Yes, but you probably want a complexity of at least 10^5-10^6 to be any
useful. I don't think you will get that early in boot from random
unless you use hardware support.

 
 4) We could add a hard limit on the length of one chain. Even if the bad guys 
 discover a flaw, it wont hurt too much.

Or just use the trie?  It has other advantages too :)

-Andi

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-28 Thread David Miller
From: Andi Kleen [EMAIL PROTECTED]
Date: 28 Mar 2007 16:14:17 +0200

 Evgeniy Polyakov [EMAIL PROTECTED] writes:
  
  Jenkins hash is far from being simple to crack, although with some
  knowledge it can be done faster.
 
 TCP tends to be initialized early before there is anything
 good in the entropy pool.

Andi, you're being an idiot.

You are spewing endless and uninformed bullshit about this secure hash
topic, and it must stop now!

You obviously didn't even read my patch, because if you did you would
have seen that I don't initialize the random seed until MUCH MUCH
later _EXACTLY_ to deal with this issue.

In my patch the random seed is initialized when the first TCP or DCCP
socket is created, at which point we'll have sufficient entropy.

So please stop talking such nonsense about this topic.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-28 Thread Andi Kleen
 In my patch the random seed is initialized when the first TCP or DCCP
 socket is created, at which point we'll have sufficient entropy.

See my discussion of this case in a later mail to Evgeniy 

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-27 Thread Andi Kleen
Nikolaos D. Bougalis [EMAIL PROTECTED] writes:
 
 I will be more than happy to provide a patch for this, but I figured I
 would solicit some input first.

To truly defend against this you would likely need a cryptographic
hash, which would be likely too slow. 

If it's a real problem the better fix would be to switch to some
kind of balanced tree (like Evgeniy is proposing) . 

But I think it can be mostly ignored. 

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-27 Thread Nikolaos D. Bougalis

Andi Kleen ([EMAIL PROTECTED]) wrote:


To truly defend against this you would likely need a cryptographic
hash, which would be likely too slow.


   I do not think that a cryptographically secure hash is necessary for 
this. Using a better hash function (i.e. one which does a good job of 
throroughly mixing the input bits, as jenkins does), is sufficient when 
combined with a secret per-boot salt, something which is easily demonstrable 
by test runs.




If it's a real problem the better fix would be to switch to some
kind of balanced tree (like Evgeniy is proposing) .


   I don't have any special attachment to the Jenkins hash, or to a hash 
table even. So, _yes_, by all means if a better solution is available let us 
use it and I'll be the first to cheer. But I don't know if Evgeniy's work is 
ready for prime time, and I consider plugging in an improved hash function 
to be an acceptable solution in the meantime.




But I think it can be mostly ignored.


   With all due respect, it cannot. An attacker with a small-sized botnet 
(which is ~250 hosts) can create chains that contain well in excess of 3000 
items. A big botnet (and there exist botnets with well over 5000 machines in 
them) can bring a system to its knees. You may not have gotten bitten by 
this, but others, including myself and Eric Dumazet have. And I believe that 
David Miller agrees, although I do not want to put words in his mouth -- or 
his keyboard, as the case may be.


   What this boils down to is, yes, we can keep patching our own kernels to 
use tagged jenkins hashing if this affects us, waiting for something better 
to come along. But isn't it more reasonable to add this into the kernel, at 
least as a non-default compile time option, and allow administrators to 
decide whether this is something they want to use?


   -n


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-27 Thread David Miller
From: Nikolaos D. Bougalis [EMAIL PROTECTED]
Date: Tue, 27 Mar 2007 22:01:38 -0700

 What this boils down to is, yes, we can keep patching our own kernels to 
 use tagged jenkins hashing if this affects us, waiting for something better 
 to come along.

You don't have to patch, I put jenkins into the tree and it will
hit 2.6.22, we don't need to even discuss this any more and you
can safely ignore Andi.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-24 Thread linux
 Result with jenkins:
 1 23880
 2 12108
 3 4040
 4 1019
 5 200
 6 30
 7 8
 8 1
 
 Xor:
 1 65536

Precisely.  This means that the Xor hash SUCKS, because its output is 
conspicuously
non-random.

What you expect is a Poisson distribution, where the chance that a chain will 
contain
k elements is
P(lambda,k) = e^-lambda * lambda^k / k!
lambda is the average loading per chain.  In your example, it's 1 (65536 
inputs, 65536 outputs).
(^ is exponentiation, ! is factorial)

So the distribution I expect to get is:
 0 24109.347656
 1 24109.347656
 2 12054.673828
 3 4018.224609
 4 1004.556152
 5 200.911224
 6 33.485203
 7 4.783601
 8 0.597950
 9 0.066439
10 0.006644

Whick looks a HELL of a lot like what you observed.
(The jenkins result above has 24250 chains with no entries.)


Now, you can sometimes use properties of the inputs to get a distribution
that is more uniform than random, by letting the distribution of the input
show through the hash funciton.  Which the xor hash does.  But this
depends on making assumptions about the input distribution, which means
that you're assuming that they're not being chosen maliciously.

If an attacker is choosing maliciously, which is a required assumption
in today's Internet, the best you can do is random.

Now, the core Jenkins hash mix function basically takes three inputs.
What jhash_3words does with it is:
a += K
b += K
c += seed
__jhash_mix(a, b, c)
return c;

Now, the ipv4 hash operation fundamentally involves 96 bits of input,
which is a good match for jhash.  If you want to add a salt, perhaps the
simplest thing would be to just replace those constants K with a 96-bit
salt and be done with it:

a = (lport16) + rport + salt[0];
Xb = laddr + salt[1];
c = raddr + salt[2];
__jhash_mix(a,b,c)
return c;

Regarding control by attackers, let's consider the four inputs and see
how much information an attacker can insert into each one:

remote port: An attacker has complete control over this.  16 bits.
remote address: Depends on the size of the bit-net.  Can vary from 0 bits
(one machine) to 20 bits for a large bot-net.
local address: Limited to the number of addresses the local machine has.
Typically 0 bits, rarely more than 2 bits.
May be much larger (8 bits or more) for stateful firewalls and
other sorts of proxies.
local port: Limited to the number of open server ports.  Typically 3-6
bits, but may be lower on heavily firewalled machines.

Certainly combining any two of these in a predictable way without some
non-linear salting makes an attacker's job easier.  While folding the
local and remote addresses before hashing is usually safe because the
local address is usually unique, there are situations in which there are
a large number of possible local addresses.

For example, it allows an attacker with a /24 to attack, say, a stateful
firewall guarding a /24.  If I have my machine at address a.b.c.d connect
to remote machine x.y.z.~d, then they always fold to a^x.b^y.c^z.255,
and I can, by making the local and remote paorts identical for all the
attacks, force 254-entry hash chains without knowing anything else about
the hash function, salt, or whatever.


An interesting question is whether it's better to mix salt into the
bits an attacker has most or least control over.  It's not immediately
obvious to me; does anyone else have insight?  Just mixing in 96 bits
over everything does seem to render the question moot.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-24 Thread Evgeniy Polyakov
On Sat, Mar 24, 2007 at 08:26:58AM -0400, [EMAIL PROTECTED] ([EMAIL PROTECTED]) 
wrote:
  Result with jenkins:
  1 23880
  2 12108
  3 4040
  4 1019
  5 200
  6 30
  7 8
  8 1
  
  Xor:
  1 65536
 
 Precisely.  This means that the Xor hash SUCKS, because its output is 
 conspicuously
 non-random.

XOR hash was specially designed for network environment, so it has the
ever best result (in this scenario, but it can be too easily controlled by 
attacker). 
It is not general purpose hash.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-23 Thread David Miller
From: Eric Dumazet [EMAIL PROTECTED]
Date: Fri, 23 Mar 2007 09:00:08 +0100

 I dont consider this new hash as bug fix at all, ie your patch might enter 
 2.6.22 normal dev cycle.

Ok, I checked the patch into net-2.6.22
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-23 Thread David Miller
From: Eric Dumazet [EMAIL PROTECTED]
Date: Thu, 22 Mar 2007 23:03:04 +0100

 David Miller a écrit :
  From: Nikolaos D. Bougalis [EMAIL PROTECTED]
  Date: Thu, 22 Mar 2007 12:44:09 -0700
  
  People _have_ had problems. _I_ have had problems. And when
  someone with a few thousand drones under his control hoses your
  servers because he can do math and he leaves you with 2-item
  long chains, _you_ will have problems.
  
  No need to further argue this point, the people that matter
  (ie. me :-) understand it, don't worry..
 
 Yes, I recall having one big server hit two years ago by an attack on tcp 
 hash 
 function. David sent me the patch to use jhash. It's performing well :)
 
 Welcome to the club :)

Ok, how about we put something like the following into 2.6.21?

I'm not looking for the hash perfectionist analysis, please bug the
heck off if that's what your reply is going to be about, don't bother
hitting the reply button I will ignore you.

I want to hear instead if this makes attackability markedly _HARDER_
than what we have now and I am sure beyond a shadow of a doubt that it
does.

The secret is inialized when the first ehash-using socket is created,
that's not perfect (bug off!) but it's better than doing the
initialization in inet_init() or {tcp,dccp}_init() as we'll have a
chance of at least some entropy when that first such socket is
created.  We definitely can't do it for the first AF_INET socket
creation, because icmp creates a bunch of SOCK_RAW inet sockets at
init time which would defeat the whole purpose of deferring this. :)

Thanks.

diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h
index c28e424..668056b 100644
--- a/include/net/inet6_hashtables.h
+++ b/include/net/inet6_hashtables.h
@@ -19,6 +19,9 @@
 #include linux/in6.h
 #include linux/ipv6.h
 #include linux/types.h
+#include linux/jhash.h
+
+#include net/inet_sock.h
 
 #include net/ipv6.h
 
@@ -28,12 +31,11 @@ struct inet_hashinfo;
 static inline unsigned int inet6_ehashfn(const struct in6_addr *laddr, const 
u16 lport,
const struct in6_addr *faddr, const __be16 
fport)
 {
-   unsigned int hashent = (lport ^ (__force u16)fport);
+   u32 ports = (lport ^ (__force u16)fport);
 
-   hashent ^= (__force u32)(laddr-s6_addr32[3] ^ faddr-s6_addr32[3]);
-   hashent ^= hashent  16;
-   hashent ^= hashent  8;
-   return hashent;
+   return jhash_3words((__force u32)laddr-s6_addr32[3],
+   (__force u32)faddr-s6_addr32[3],
+   ports, inet_ehash_secret);
 }
 
 static inline int inet6_sk_ehashfn(const struct sock *sk)
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index ce6da97..62daf21 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -19,6 +19,7 @@
 
 #include linux/string.h
 #include linux/types.h
+#include linux/jhash.h
 
 #include net/flow.h
 #include net/sock.h
@@ -167,13 +168,15 @@ static inline void inet_sk_copy_descendant(struct sock 
*sk_to,
 
 extern int inet_sk_rebuild_header(struct sock *sk);
 
+extern u32 inet_ehash_secret;
+extern void build_ehash_secret(void);
+
 static inline unsigned int inet_ehashfn(const __be32 laddr, const __u16 lport,
const __be32 faddr, const __be16 fport)
 {
-   unsigned int h = ((__force __u32)laddr ^ lport) ^ ((__force __u32)faddr 
^ (__force __u32)fport);
-   h ^= h  16;
-   h ^= h  8;
-   return h;
+   return jhash_2words((__force __u32) laddr ^ (__force __u32) faddr,
+   ((__u32) lport)  16 | (__force __u32)fport,
+   inet_ehash_secret);
 }
 
 static inline int inet_sk_ehashfn(const struct sock *sk)
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index cf358c8..308318a 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -87,6 +87,7 @@
 #include linux/init.h
 #include linux/poll.h
 #include linux/netfilter_ipv4.h
+#include linux/random.h
 
 #include asm/uaccess.h
 #include asm/system.h
@@ -217,6 +218,16 @@ out:
return err;
 }
 
+u32 inet_ehash_secret;
+EXPORT_SYMBOL(inet_ehash_secret);
+
+void build_ehash_secret(void)
+{
+   while (!inet_ehash_secret)
+   get_random_bytes(inet_ehash_secret, 4);
+}
+EXPORT_SYMBOL(build_ehash_secret);
+
 /*
  * Create an inet socket.
  */
@@ -233,6 +244,11 @@ static int inet_create(struct socket *sock, int protocol)
int try_loading_module = 0;
int err;
 
+   if (sock-type != SOCK_RAW 
+   sock-type != SOCK_DGRAM 
+   !inet_ehash_secret)
+   build_ehash_secret();
+
sock-state = SS_UNCONNECTED;
 
/* Look for the requested type/protocol pair. */
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 5cac14a..0de723f 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -98,6 +98,11 @@ static int inet6_create(struct socket *sock, int protocol)
int try_loading_module 

Re: RFC: Established connections hash function

2007-03-23 Thread Evgeniy Polyakov
On Thu, Mar 22, 2007 at 01:53:03PM -0700, Nikolaos D. Bougalis ([EMAIL 
PROTECTED]) wrote:
 Grrr, I think I pointed several times already, that properly distributed
 values do not change distribution after folding. And it can be seen in
 all tests (and in that you pointed too).
 
Yes, I agree that the folding will not be a problem _IF_ the values are 
 properly distributed -- although in that case, the folding is unnecessary. 
 But that the Jenkins distribution didn't change (according to posts you 
 made) after folding says that the output of Jenkins is pretty good to begin 
 with ;)

In _some_ cases, but not in all.
 
  We can use jhash_2words(laddr, faddr, portpair^inet_ehash_rnd) 
  though.
  
 Please explain to me how jhash_2words solves the issue that you 
   claim
  jhash_3words has, when they both use the same underlying bit-mixer?
  
  $c value is not properly distributed and significanly breaks overall
  distribution. Attacker, which controls $c (and it does it by 
  controlling
  ports), can significantly increase selected hash chains.
 
Even if we assume that $c is not properly distributed, using a secret 
 cookie and mixing operations from different algebraic groups changes the 
 calculus dramatically. It's no longer straight-forward for the attacker to 
 generate collisions (as it is with the current function) because the '$c' 
 supplied by the attacker is used in conjunction with the secret cookie 
 before __jhash_mix thoroughly mixes the inputs to generate a hash.

With XOR hash attacker can predict end result easily, with jenkins it
can not (easily), but jenkins distribution itself (even for usual data) 
results in too long chains - there are two problems:
1. easily predicted result
2. broken distribution

Xor hash has problems with first one, Jenkins (in some cases) with
second.
 
 I've tested the Jenkins hash extensively. I see no evidence of this
  improper distribution that you describe. In fact, about the only 
  person
  that I've seen advocate this in the archives of netdev is you, and a lot 
  of
  other very smart people disagree with you, so I consider myself to be in
  good company.
 
 Hmm, I ran tests to select proper hash for netchannel implementation
 (actualy the same as sockets) and showed Jenkin's hash problems - it is
 enough to have only problem to state that there is a problem, doesn't
 it?
 
Again, from what I've seen from your other posts, I don't believe you've 
 identified any inherent problems with the Jenkins hash.
 
But that aside for a moment, surely you will agree that the ability of 
 an attacker with a few dozen machines under his control to trivially mount 
 an algorithmic complexity attack causing serious performance drops is also 
 a problem with the current code and one that must be addressed.

Please refer to above two problems - Jenkins hash does not have problem
with easy end result detection, instead if has distribution problem.
Which means that attacker should not guess hash chains, it should
provide special crafted input and distribution will be shifted to the
higher levels.
 
 I will try to decipher phrase 'whatever it is, it's not there'...
 
It meant that I saw nothing particularly interesting running the example 
 you suggested and looking at the output.
 
 
 This thread for example:
 http://marc.info/?t=11705761351r=1w=2
 
I went through most of this thread. I don't see an analysis of the 
 Jenkins. Am I missing something?

There is no full analysis, I just posted results I found when selected
hash for different projects with similar to sockets background.
 
 One your test shows thare are no problems, try that one I propose, which
 can be even created in userspace - you do not want even to get into
 account what I try to say to you.
 
I'm not trying to be obnoxious on purpose here, but I don't see the test 
 that you are referring to. Could you be more specific?

http://marc.info/?l=linux-netdevm=117199140430104q=p5

-n
-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-23 Thread Eric Dumazet

David Miller a écrit :

From: Eric Dumazet [EMAIL PROTECTED]



Welcome to the club :)


Ok, how about we put something like the following into 2.6.21?


2.6.21 really ?

Just to be clear : I had an attack two years ago, I applied your patch, 
rebooted the machine, and since then the attackers had to find another way to 
hurt the machine. Eventually, when I update the kernel of this machine, I 
forget to appply jhash patch, and attackers dont know they can try again :)


I dont consider this new hash as bug fix at all, ie your patch might enter 
2.6.22 normal dev cycle.


Maybe a *fix*, independant of the hash function (so that no math expert can 
insult us), would be to have a *limit*, say... 1000 (something insane) on the 
length of a hash chain ?


In my case, I saw lengths of about 3000 two years ago under attack, but 
machine was still usable... maybe in half power mode.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-23 Thread Evgeniy Polyakov
On Thu, Mar 22, 2007 at 01:58:34PM -0700, David Miller ([EMAIL PROTECTED]) 
wrote:
 From: Nikolaos D. Bougalis [EMAIL PROTECTED]
 Date: Thu, 22 Mar 2007 12:44:09 -0700
 
  People _have_ had problems. _I_ have had problems. And when
  someone with a few thousand drones under his control hoses your
  servers because he can do math and he leaves you with 2-item
  long chains, _you_ will have problems.
 
 No need to further argue this point, the people that matter
 (ie. me :-) understand it, don't worry..

Call me a loooser which mail will be deleted on arrival, but...

jhash_2words(const, const, ((const  16) | $sport) ^ $random)

where $sport is 1-65535 in a loop, and $random is pseudo-random number
obtained on start.

Which is exactly the case of web server and attacker connects to 80 port
from the same IP address and different source ports.

Result with jenkins:
1 23880
2 12108
3 4040
4 1019
5 200
6 30
7 8
8 1

Xor:
1 65536


Please, do not apply patch as is, I will devote this day to find where
jenkins has problems and try to fix distribution. If I will fail, then
it is up to you to decide that above results are bad or good.

Thank you.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-23 Thread Eric Dumazet

Evgeniy Polyakov a ecrit :

Call me a loooser which mail will be deleted on arrival, but...

jhash_2words(const, const, ((const  16) | $sport) ^ $random)

where $sport is 1-65535 in a loop, and $random is pseudo-random number
obtained on start.

Which is exactly the case of web server and attacker connects to 80 port
from the same IP address and different source ports.

Result with jenkins:
1 23880
2 12108
3 4040
4 1019
5 200
6 30
7 8
8 1

Xor:
1 65536


So what ? You still think hash function must be bijective ? Come on !

You have a machine somewhere that allows 65536 concurrent connections coming 
from the same IP address ?


The last problem you have is the nature of tcp hash function.

Dont argue again with your pseudo science.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-23 Thread Evgeniy Polyakov
On Fri, Mar 23, 2007 at 09:17:19AM +0100, Eric Dumazet ([EMAIL PROTECTED]) 
wrote:
 You have a machine somewhere that allows 65536 concurrent connections 
 coming from the same IP address ?

Attached png file of botnet scenario:
1000 addresses from the same network (class B for example),
each one creates 1024 connections to the same static port.

Eric, I agree, that XOR hash is not perfect, and it should be changed,
but not blindly.

I perfectly know that hash function is not bijective, but it must have
good distribution. 
Function like this
int hash(u32 saddr, u16 sport, u32 daddr, u16 dport, u32 rand)
{
return rand ^ saddr ^ daddr)16)^(dport ^ sport)) 8);
}

has even worse _distribution_, although you can not predict its end
result due to random value, and attacker will not try to do it.

-- 
Evgeniy Polyakov


jhash_botnet.png
Description: PNG image


Re: RFC: Established connections hash function

2007-03-23 Thread Evgeniy Polyakov
On Fri, Mar 23, 2007 at 11:33:32AM +0300, Evgeniy Polyakov ([EMAIL PROTECTED]) 
wrote:
 Eric, I agree, that XOR hash is not perfect, and it should be changed,
 but not blindly.

Attached case of how broken can be xor in botnet scenario.


-- 
Evgeniy Polyakov


jhash_good.png
Description: PNG image


XOR hash beauty solved [Was: RFC: Established connections hash function]

2007-03-23 Thread Evgeniy Polyakov
 Please, do not apply patch as is, I will devote this day to find where
 jenkins has problems and try to fix distribution. If I will fail, then
 it is up to you to decide that above results are bad or good.

I need to admit that I was partially wrong in my analysis of the Jenkins
hash distribution - it does _not_ have problems and any kind of artifacts. 
Waves found in tests are results of folding into hash_size boundary,
distribution inside F(32) field is unifirm.
XOR hash does not have such problem, because it uses (u32 ^ u16) as one
round, which results in the uniform (it is not correct to call that
distribution uniform as is, but only getting into account that u16 values 
used in tests were uniformly distributed) distribution inside F(16), which 
does not suffer from hash_size boundary folding. Since XOR hash has 3 rounds, 
only one of them (xor of the final u32 values) will suffer from folding, but 
tests where is it can be determined for sure use constant addresses, so 
problem hides again.

So, briefly saying, jhash_2/3words have safe distribution, but have
higher-number of elements waves as a result of folding which is
unavoidable for general-purpose hash.

Now my conscience is calm :)

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-23 Thread Nikolaos D. Bougalis
   Let me start off by saying that I hope I didn't come across as 
condenscending in my previous posts. If I did, then it wasn't intended. Now, 
on to more important things :)




jhash_2words(const, const, ((const  16) | $sport) ^ $random)

where $sport is 1-65535 in a loop, and $random is pseudo-random number
obtained on start.


   If you are correct that jhash_3words doesn't properly distribute the 
bits in 'c' (which I don't believe you are, but let's assume it for a 
second) then this function will also be broken:  jhash_2words calls 
jhash_3words; jhash_3words adds (a linear operation) initval and c before 
calling __jhash_mix. So, if there is a problem with passing values under the 
direct control of the attacker into 'c' both jhash_2words and jhash_3words 
are affected; in other words, this variant would also be flawed.




Which is exactly the case of web server and attacker connects to 80 port
from the same IP address and different source ports.

Result with jenkins:
1 23880
2 12108
3 4040
4 1019
5 200
6 30
7 8
8 1

Xor:
1 65536


   I believe that the XOR results, if generated by your test above, are 
somewhat meaningless because you're feeding what is ideal input into the XOR 
hash. Which means that you'll get a perfect distribution. With your input, 
one might as well suggest that using the remote port will give a perfect 
distribution, and it will, but only for that specific input.


   Just for kicks, I went to one of our servers, and did netstat -n | grep 
ESTABLISHED and ended up with 31072 distinct ip:port/ip:port 4-tuples which 
I then hashed into a 65536 bucket table. There are the results; feel free to 
draw your own conclusions:


[ I think this should come out looking good; sorry if whitespace is screwy ]

+---+---+---+---+---+
|   |  xor  | j2w 1 | j2w 2 | j3w 1 |
+---+---+---+---+
| 0 | 40868 | 40930 | 40767 | 40750 |
| 1 | 19208 | 19119 | 19382 | 19413 |
| 2 |  4636 |  4618 |  4576 |  4554 |
| 3 |   716 |   769 |   715 |   734 |
| 4 |99 |91 |87 |76 |
| 5 | 7 | 8 | 9 | 9 |
| 6 | 1 | 1 | 0 | 0 |
| 7 | 1 | 0 | 0 | 0 |
| 8 | 0 | 0 | 0 | 0 |
+---+---+---+---+---+

xor: the vanilla linux function
j2w 1 is my variant: jhash_2words(laddr + rport, raddr + lport, seed)
j2w 2 is your variant: jhash_2words(laddr, raddr, (rport  16) ^ lport) ^ 
seed)

j3w: jhash_3words(laddr, raddr, (rport  12) + lport, seed)

   The seed used for all the Jenkins hashes came from the low-order 32-bits 
returned by RDTSC, executed when the program started. It remained constant 
throughout the run. 8 runs where made, to ensure that the seed wasn't 
causing weirdness, all runs giving almost identical results. The Jenkins 
hashes did not use the extra 2 right-shifts to fold high-order bits into the 
low-order bits, that is employed by the XOR hash.


   -n 



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: XOR hash beauty solved [Was: RFC: Established connections hash function]

2007-03-23 Thread Nikolaos D. Bougalis

So, briefly saying, jhash_2/3words have safe distribution, but have
higher-number of elements waves as a result of folding which is
unavoidable for general-purpose hash.


   Thanks for the analysis. 


   -n


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RFC: Established connections hash function

2007-03-22 Thread Nikolaos D. Bougalis

   Hello,

   I have noticed that the hash function that the kernel uses for
established TCP/IP connections is rather simplistic, specifically:

   h = (local address ^ local_port) ^ (remote_address ^ remote_port);
   h ^= h  16;
   h ^= h  8;

   Now, simple is great, but this has a number of issues, not the least of
which is that an attacker can very easily cause collisions and force
extremely long chain lengths, a situation that becomes worse the more
distinct IP addresses and listening ports a box has.

   Consider, for example, a box that has 20 ports open and 4 consecutive IP
addresses. An attacker that has an entire class C available can create
24,576 connections that hash to the same value, resulting in a ridiculously
overlong chain. With servers that do virtual hosting and have dozens of IPs,
the situation can become much worse very fast.

   This particular hash seems to be the odd-man out, since most other
network related hashes in the kernel seem to be Jenkins-based, and some use
tagged hashing to defeat algorithmic complexity attacks. For example, the
route hash uses this:

static unsigned int rt_hash_rnd;

static unsigned int rt_hash_code(u32 daddr, u32 saddr)
{
   return (jhash_2words(daddr, saddr, rt_hash_rnd)
rt_hash_mask);
}

   With this in mind, I propose the following replacement for inet_ehashfn,
which defeats algorithmic complexity attacks and achieves excellent
distribution:

unsigned int inet_ehashfn(const __be32 laddr, const __u16 lport,
 const __be32 faddr, const __be16 fport)
{
   return jhash_3words((__force __u32)faddr, (__force __u32)laddr,
   (((__force __u32)fport)  16) + lport,
   inet_ehash_rnd);
}

   where inet_ehash_rnd is initialized once in tcp_init to a random 32-bit
value.

   I will be more than happy to provide a patch for this, but I figured I
would solicit some input first.

   Nik B.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-22 Thread Evgeniy Polyakov
On Thu, Mar 22, 2007 at 08:39:04AM -0700, Nikolaos D. Bougalis ([EMAIL 
PROTECTED]) wrote:
This particular hash seems to be the odd-man out, since most other
 network related hashes in the kernel seem to be Jenkins-based, and some use
 tagged hashing to defeat algorithmic complexity attacks. For example, the
 route hash uses this:

It seems you do not know a history...
It is the fastest and actually the best hash for that workloads where it
is used, but unfortunately it is too simple for attacker to predict end
result.

 static unsigned int rt_hash_rnd;
 
 static unsigned int rt_hash_code(u32 daddr, u32 saddr)
 {
return (jhash_2words(daddr, saddr, rt_hash_rnd)
 rt_hash_mask);
 }
 
With this in mind, I propose the following replacement for inet_ehashfn,
 which defeats algorithmic complexity attacks and achieves excellent
 distribution:
 
 unsigned int inet_ehashfn(const __be32 laddr, const __u16 lport,
  const __be32 faddr, const __be16 fport)
 {
return jhash_3words((__force __u32)faddr, (__force __u32)laddr,
(((__force __u32)fport)  16) + lport,
inet_ehash_rnd);
 }

And this is utterly broken. For more details please read netdev@
archives and trivial analysis of jhash_3words().

We can use jhash_2words(laddr, faddr, portpair^inet_ehash_rnd) though.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-22 Thread Nikolaos D. Bougalis
On Thu, March 22, 2007 at 8:52 AM, Evgeniy Polyakov [EMAIL PROTECTED] 
wrote:



It seems you do not know a history...


   I know a lot about history. I may not know the specific history you had 
in mind though.


   I do see now that this has been brought up before. Before posting, I did 
search the archives, but obviously not closely enough. I blame it on the 
early morning hour.




It is the fastest and actually the best hash for that workloads where it
is used, but unfortunately it is too simple for attacker to predict end
result.


   Yes, the distribution of the vanilla function is decent, and yes it's 
very fast. But all that won't help you when chains start having tens of 
thousands of items, and you have to iterate through them constantly. But I 
guess that if it makes you feel better, you can call that unfortunate.




unsigned int inet_ehashfn(const __be32 laddr, const __u16 lport,
 const __be32 faddr, const __be16 fport)
{
   return jhash_3words((__force __u32)faddr, (__force __u32)laddr,
   (((__force __u32)fport)  16) + lport,
   inet_ehash_rnd);
}


And this is utterly broken. For more details please read netdev@
archives and trivial analysis of jhash_3words().


   Utterly broken? Nonsense. I have tested the actual function I proposed 
(sans the __force and __u32 stuff, which weren't necessary in my test 
program), against real data, collected from various servers in real-time. It 
has consistently achieved lower average chain lengths than the vanilla 
function and demonstrated no artifacting, and that's trivial to verify.


   The only analysis I could find was this 
http://tservice.net.ru/~s0mbre/blog/2006/05/14#2006_05_14, which uses 
jhash_2words, and not jhash_3words, and which naively attempts to take the 
output of jhash_2words, and to perform the same mixing trick that the 
vanilla inet_ehashfn does and uses artificially generated data sets.


   But please, feel free to point out any other _unfavorable_ analyses of 
jhash_2words or jhash_3words that I may have missed.




We can use jhash_2words(laddr, faddr, portpair^inet_ehash_rnd) though.


   Please explain to me how jhash_2words solves the issue that you claim 
jhash_3words has, when they both use the same underlying bit-mixer?


   -n 



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-22 Thread Evgeniy Polyakov
On Thu, Mar 22, 2007 at 10:32:44AM -0700, Nikolaos D. Bougalis ([EMAIL 
PROTECTED]) wrote:
Utterly broken? Nonsense. I have tested the actual function I proposed 
 (sans the __force and __u32 stuff, which weren't necessary in my test 
 program), against real data, collected from various servers in real-time. 
 It has consistently achieved lower average chain lengths than the vanilla 
 function and demonstrated no artifacting, and that's trivial to verify.

So what?
People test and work with XOR hash for years and they do not strike any
problems. If we talk about specially crafted data, then XOR one is no
worse than Jenkins with 3 words (which is even worse for blind attack of 
constant ports).

The only analysis I could find was this 
 http://tservice.net.ru/~s0mbre/blog/2006/05/14#2006_05_14, which uses 
 jhash_2words, and not jhash_3words, and which naively attempts to take the 
 output of jhash_2words, and to perform the same mixing trick that the 
 vanilla inet_ehashfn does and uses artificially generated data sets.

It is outdated, check recent netdev@ archives. Folding used in that test
does not change distribution, and data was presented as it can be
selected by attacker, who can create with any distribution.

But please, feel free to point out any other _unfavorable_ analyses of 
 jhash_2words or jhash_3words that I may have missed.
 
 
 We can use jhash_2words(laddr, faddr, portpair^inet_ehash_rnd) though.
 
Please explain to me how jhash_2words solves the issue that you claim 
 jhash_3words has, when they both use the same underlying bit-mixer?

$c value is not properly distributed and significanly breaks overall
distribution. Attacker, which controls $c (and it does it by controlling 
ports), can significantly increase selected hash chains.

But it is only $c, $a and $b are properly distributed, so jhash_2words()
is safer than jhash_3words().
Just create a simple application which does
jhash_3words(a, b, rand(), init) and jhash_2words(a, b, rand()) and see
results.

I want to emphasize: it is not about random generator, but about data,
controlled by attacker, which can select it like in tests. $c in
jhash_3words() is the weakest part, jhash_2words() works much better in
that regard.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-22 Thread Nikolaos D. Bougalis

On Thu, Mar 22, 2007 11:21 AM, Evgeniy Polyakov [EMAIL PROTECTED] wrote:


   Utterly broken? Nonsense. I have tested the actual function I proposed
(sans the __force and __u32 stuff, which weren't necessary in my test
program), against real data, collected from various servers in real-time.
It has consistently achieved lower average chain lengths than the vanilla
function and demonstrated no artifacting, and that's trivial to verify.


So what?


   So what? Are you serious?



People test and work with XOR hash for years and they do not strike any
problems. If we talk about specially crafted data, then XOR one is no
worse than Jenkins with 3 words (which is even worse for blind attack of
constant ports).


   People _have_ had problems. _I_ have had problems. And when someone with 
a few thousand drones under his control hoses your servers because he can do 
math and he leaves you with 2-item long chains, _you_ will have 
problems. And sticking your head in the sand and saying people work with 
XOR hash for years and they do not strike any problems wont help you.




   The only analysis I could find was this
http://tservice.net.ru/~s0mbre/blog/2006/05/14#2006_05_14, which uses
jhash_2words, and not jhash_3words, and which naively attempts to take 
the

output of jhash_2words, and to perform the same mixing trick that the
vanilla inet_ehashfn does and uses artificially generated data sets.


It is outdated, check recent netdev@ archives. Folding used in that test
does not change distribution, and data was presented as it can be
selected by attacker, who can create with any distribution.


   Be careful here. If the folding makes no difference, it says something 
very important about __jhash_mix, and that something goes against the very 
thing that you are saying.




   But please, feel free to point out any other _unfavorable_ analyses of
jhash_2words or jhash_3words that I may have missed.


We can use jhash_2words(laddr, faddr, portpair^inet_ehash_rnd) though.

   Please explain to me how jhash_2words solves the issue that you claim
jhash_3words has, when they both use the same underlying bit-mixer?


$c value is not properly distributed and significanly breaks overall
distribution. Attacker, which controls $c (and it does it by controlling
ports), can significantly increase selected hash chains.


   I've tested the Jenkins hash extensively. I see no evidence of this 
improper distribution that you describe. In fact, about the only person 
that I've seen advocate this in the archives of netdev is you, and a lot of 
other very smart people disagree with you, so I consider myself to be in 
good company.




But it is only $c, $a and $b are properly distributed, so jhash_2words()
is safer than jhash_3words().
Just create a simple application which does
jhash_3words(a, b, rand(), init) and jhash_2words(a, b, rand()) and see
results.


   What exactly am I supposed to see in these results? Because whatever it 
is, it's not there. Feel free to provide a link to your data and a histogram 
that shows what you find of interest though, and I'll be happy to look at 
it.


   -n



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-22 Thread Evgeniy Polyakov
On Thu, Mar 22, 2007 at 12:44:09PM -0700, Nikolaos D. Bougalis ([EMAIL 
PROTECTED]) wrote:
 On Thu, Mar 22, 2007 11:21 AM, Evgeniy Polyakov [EMAIL PROTECTED] wrote:
 
Utterly broken? Nonsense. I have tested the actual function I proposed
 (sans the __force and __u32 stuff, which weren't necessary in my test
 program), against real data, collected from various servers in real-time.
 It has consistently achieved lower average chain lengths than the vanilla
 function and demonstrated no artifacting, and that's trivial to verify.
 
 So what?
 
So what? Are you serious?

We started our discussion a bit wrong - let's start it again, ok? :)
 
 People test and work with XOR hash for years and they do not strike any
 problems. If we talk about specially crafted data, then XOR one is no
 worse than Jenkins with 3 words (which is even worse for blind attack of
 constant ports).
 
People _have_ had problems. _I_ have had problems. And when someone with 
 a few thousand drones under his control hoses your servers because he can 
 do math and he leaves you with 2-item long chains, _you_ will have 
 problems. And sticking your head in the sand and saying people work with 
 XOR hash for years and they do not strike any problems wont help you.

You do not want to read what was written - _if_ we use artificial data,
then attacker can use it too, so if it is possible to break the system
with artificial data, then it is possible it will be broken in a real
life. If we use usual data, then we are ok (although Jenkins with 3
words is not ok).
 
The only analysis I could find was this
 http://tservice.net.ru/~s0mbre/blog/2006/05/14#2006_05_14, which uses
 jhash_2words, and not jhash_3words, and which naively attempts to take 
 the
 output of jhash_2words, and to perform the same mixing trick that the
 vanilla inet_ehashfn does and uses artificially generated data sets.
 
 It is outdated, check recent netdev@ archives. Folding used in that test
 does not change distribution, and data was presented as it can be
 selected by attacker, who can create with any distribution.
 
Be careful here. If the folding makes no difference, it says something 
 very important about __jhash_mix, and that something goes against the very 
 thing that you are saying.

Grrr, I think I pointed several times already, that properly distributed
values do not change distribution after folding. And it can be seen in
all tests (and in that you pointed too).

But please, feel free to point out any other _unfavorable_ analyses of
 jhash_2words or jhash_3words that I may have missed.
 
 
 We can use jhash_2words(laddr, faddr, portpair^inet_ehash_rnd) though.
 
Please explain to me how jhash_2words solves the issue that you claim
 jhash_3words has, when they both use the same underlying bit-mixer?
 
 $c value is not properly distributed and significanly breaks overall
 distribution. Attacker, which controls $c (and it does it by controlling
 ports), can significantly increase selected hash chains.
 
I've tested the Jenkins hash extensively. I see no evidence of this 
 improper distribution that you describe. In fact, about the only person 
 that I've seen advocate this in the archives of netdev is you, and a lot of 
 other very smart people disagree with you, so I consider myself to be in 
 good company.

Hmm, I ran tests to select proper hash for netchannel implementation
(actualy the same as sockets) and showed Jenkin's hash problems - it is
enough to have only problem to state that there is a problem, doesn't
it?
 
 But it is only $c, $a and $b are properly distributed, so jhash_2words()
 is safer than jhash_3words().
 Just create a simple application which does
 jhash_3words(a, b, rand(), init) and jhash_2words(a, b, rand()) and see
 results.
 
What exactly am I supposed to see in these results? Because whatever it 
 is, it's not there. Feel free to provide a link to your data and a 

I will try to decipher phrase 'whatever it is, it's not there'...

 histogram that shows what you find of interest though, and I'll be happy to 
 look at it.

This thread for example:
http://marc.info/?t=11705761351r=1w=2

One your test shows thare are no problems, try that one I propose, which
can be even created in userspace - you do not want even to get into
account what I try to say to you.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-22 Thread Nikolaos D. Bougalis



We started our discussion a bit wrong - let's start it again, ok? :)


   Fair enough.



You do not want to read what was written - _if_ we use artificial data,
then attacker can use it too, so if it is possible to break the system
with artificial data, then it is possible it will be broken in a real
life. If we use usual data, then we are ok (although Jenkins with 3
words is not ok).


   I'm not saying that an attacker cannot use artificial data. Indeed, 
algorithmic complexity attacks are all about 'crafting' artificial data with 
certain properties. So, yes, I absolutely agree that attackers can and do 
use artificial data.



Be careful here. If the folding makes no difference, it says 
 something
 very important about __jhash_mix, and that something goes against the 
 very

 thing that you are saying.

Grrr, I think I pointed several times already, that properly distributed
values do not change distribution after folding. And it can be seen in
all tests (and in that you pointed too).


   Yes, I agree that the folding will not be a problem _IF_ the values are 
properly distributed -- although in that case, the folding is unnecessary. 
But that the Jenkins distribution didn't change (according to posts you 
made) after folding says that the output of Jenkins is pretty good to begin 
with ;)



 We can use jhash_2words(laddr, faddr, portpair^inet_ehash_rnd) 
 though.

 
Please explain to me how jhash_2words solves the issue that you 
  claim

 jhash_3words has, when they both use the same underlying bit-mixer?
 
 $c value is not properly distributed and significanly breaks overall
 distribution. Attacker, which controls $c (and it does it by 
 controlling

 ports), can significantly increase selected hash chains.


   Even if we assume that $c is not properly distributed, using a secret 
cookie and mixing operations from different algebraic groups changes the 
calculus dramatically. It's no longer straight-forward for the attacker to 
generate collisions (as it is with the current function) because the '$c' 
supplied by the attacker is used in conjunction with the secret cookie 
before __jhash_mix thoroughly mixes the inputs to generate a hash.




I've tested the Jenkins hash extensively. I see no evidence of this
 improper distribution that you describe. In fact, about the only 
 person
 that I've seen advocate this in the archives of netdev is you, and a lot 
 of

 other very smart people disagree with you, so I consider myself to be in
 good company.

Hmm, I ran tests to select proper hash for netchannel implementation
(actualy the same as sockets) and showed Jenkin's hash problems - it is
enough to have only problem to state that there is a problem, doesn't
it?


   Again, from what I've seen from your other posts, I don't believe you've 
identified any inherent problems with the Jenkins hash.


   But that aside for a moment, surely you will agree that the ability of 
an attacker with a few dozen machines under his control to trivially mount 
an algorithmic complexity attack causing serious performance drops is also a 
problem with the current code and one that must be addressed.




I will try to decipher phrase 'whatever it is, it's not there'...


   It meant that I saw nothing particularly interesting running the example 
you suggested and looking at the output.




This thread for example:
http://marc.info/?t=11705761351r=1w=2


   I went through most of this thread. I don't see an analysis of the 
Jenkins. Am I missing something?




One your test shows thare are no problems, try that one I propose, which
can be even created in userspace - you do not want even to get into
account what I try to say to you.


   I'm not trying to be obnoxious on purpose here, but I don't see the test 
that you are referring to. Could you be more specific?


   -n


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-22 Thread David Miller
From: Nikolaos D. Bougalis [EMAIL PROTECTED]
Date: Thu, 22 Mar 2007 12:44:09 -0700

 People _have_ had problems. _I_ have had problems. And when
 someone with a few thousand drones under his control hoses your
 servers because he can do math and he leaves you with 2-item
 long chains, _you_ will have problems.

No need to further argue this point, the people that matter
(ie. me :-) understand it, don't worry..
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Established connections hash function

2007-03-22 Thread Eric Dumazet

David Miller a écrit :

From: Nikolaos D. Bougalis [EMAIL PROTECTED]
Date: Thu, 22 Mar 2007 12:44:09 -0700


People _have_ had problems. _I_ have had problems. And when
someone with a few thousand drones under his control hoses your
servers because he can do math and he leaves you with 2-item
long chains, _you_ will have problems.


No need to further argue this point, the people that matter
(ie. me :-) understand it, don't worry..


Yes, I recall having one big server hit two years ago by an attack on tcp hash 
function. David sent me the patch to use jhash. It's performing well :)


Welcome to the club :)

= net/ipv4/tcp_ipv4.c 1.114 vs edited =
--- 1.114/net/ipv4/tcp_ipv4.c2005-03-26 15:04:35 -08:00
+++ edited/net/ipv4/tcp_ipv4.c2005-04-05 13:39:52 -07:00
@@ -103,14 +103,15 @@
  */
 int sysctl_local_port_range[2] = { 1024, 4999 };
 int tcp_port_rover = 1024 - 1;
+static u32 tcp_v4_hash_rand;

 static __inline__ int tcp_hashfn(__u32 laddr, __u16 lport,
  __u32 faddr, __u16 fport)
 {
-int h = (laddr ^ lport) ^ (faddr ^ fport);
-h ^= h  16;
-h ^= h  8;
-return h  (tcp_ehash_size - 1);
+return jhash_2words(laddr ^ faddr,
+(lport  16) | fport,
+tcp_v4_hash_rand) 
+(tcp_ehash_size - 1);
 }

  static __inline__ int tcp_sk_hashfn(struct sock *sk)
 @@ -2626,6 +2627,9 @@
  panic(Failed to create the TCP control socket.\n);
  tcp_socket-sk-sk_allocation   = GFP_ATOMIC;
  inet_sk(tcp_socket-sk)-uc_ttl = -1;
 +
 +get_random_bytes(tcp_v4_hash_rand, 4);
 +tcp_v4_hash_rand ^= jiffies;

  /* Unhash it so that IP input processing does not even
   * see it, we do not wish this socket to see incoming




-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html