Re: [Caml-list] Hashtbl and security

2012-02-09 Thread Goswin von Brederlow
Gerd Stolpmann i...@gerd-stolpmann.de writes:

 Well, looks like that the upper 32 bits aren't considered at all for
 computing the hash value. This is for sure surprising - I would have
 expected that these bits somehow go in to the final value (e.g. that they
 are xor-ed with the lower half).

 My guess is that this was done to ensure that an int-to-something
 hashtable can be marshalled from 64 to 32 bit systems. Not sure for what
 this is important, but it looks like a nice symmetry.

Marshaling hashtables of ints might be an argument. BUT:

# Hashtbl.hash (1 lsr 40);;
- : int = 0
# Hashtbl.hash (Int64.of_int (1 lsr 40));;
- : int = 0
# Hashtbl.hash (Int64.of_int (1 lsr 40 + 1));;
- : int = 1

 In trunk the method for ensuring this seems to have changed (see function
 caml_hash_mix_intnat in
 http://caml.inria.fr/cgi-bin/viewvc.cgi/ocaml/trunk/byterun/hash.c?revision=12072view=markup).

 Gerd

Lets hope that fixes this.

MfG
Goswin

-- 
Caml-list mailing list.  Subscription management and archives:
https://sympa-roc.inria.fr/wws/info/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs



Re: [Caml-list] Hashtbl and security

2012-02-09 Thread Goswin von Brederlow
AUGER Cédric cedric.au...@lri.fr writes:

 More seriously, hashtables are here to store values, so you don't want
 want to have a hash value bigger than (2^30)-1.

 In fact even such a value is too big for that purpouse IMHO, since on a
 4Gio RAM computer, it will take half it to store the table, assuming
 each entry only takes 4 octets, that is it would be ok only for storing
 just the structure of the table (each entry has to be a pointer),
 so that in practice, either your table is ridiculously big to store
 too few values either it is relevant to have such a size and in this
 case your computer will spend all its time to swap since you won't
 have enough RAM.

But I have 128GB ram. :) That allows for a hashtable of size 2^33
entries pointing at simple values (e.g. ints) without swapping. And ram
sizes get bigger and bigger.

[Doesn't a Hashtbl of ints store the ints directly? That would allow
2^34 entries.]

  Clearly you cannot a bijection from 63 bits
 integers to 31 bits integers; so there are a lot of collisions. I do
 not have a 64 arch, but I guesse that if you hash 2^48+168 you will
 get 168. I guess such a function is ways to be ideal, since when
 playing with bitvectors having two integers which differs by only one
 bit is very common, but at least it is a fast function, and has a good
 repartition (you have 2^32 collisions exactly per bucket).

Ignoring the upper bits makes for a verry verry poor hash function. Also
means it is 100% trivial to create a DOS by tuneing your input values to
only differ in the upper bits.

MfG
Goswin

-- 
Caml-list mailing list.  Subscription management and archives:
https://sympa-roc.inria.fr/wws/info/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs



Re: [Caml-list] Hashtbl and security

2012-02-09 Thread Gerd Stolpmann

 AUGER Cédric cedric.au...@lri.fr writes:

 More seriously, hashtables are here to store values, so you don't want
 want to have a hash value bigger than (2^30)-1.

 In fact even such a value is too big for that purpouse IMHO, since on a
 4Gio RAM computer, it will take half it to store the table, assuming
 each entry only takes 4 octets, that is it would be ok only for storing
 just the structure of the table (each entry has to be a pointer),
 so that in practice, either your table is ridiculously big to store
 too few values either it is relevant to have such a size and in this
 case your computer will spend all its time to swap since you won't
 have enough RAM.

 But I have 128GB ram. :) That allows for a hashtable of size 2^33
 entries pointing at simple values (e.g. ints) without swapping. And ram
 sizes get bigger and bigger.

Fully agreed. The limitation to 30 bits is already unpractical for such
large computers. Maybe we need a second hash function

val Hashtbl.wide_hash : 'a - int

without this constraint (on 64 bit systems).

Btw, you can already define your own hash functions, just use the
functorized interface. And you are not limited to 30 bits. It's a bit
inconvenient only.

Gerd




 [Doesn't a Hashtbl of ints store the ints directly? That would allow
 2^34 entries.]

  Clearly you cannot a bijection from 63 bits
 integers to 31 bits integers; so there are a lot of collisions. I do
 not have a 64 arch, but I guesse that if you hash 2^48+168 you will
 get 168. I guess such a function is ways to be ideal, since when
 playing with bitvectors having two integers which differs by only one
 bit is very common, but at least it is a fast function, and has a good
 repartition (you have 2^32 collisions exactly per bucket).

 Ignoring the upper bits makes for a verry verry poor hash function. Also
 means it is 100% trivial to create a DOS by tuneing your input values to
 only differ in the upper bits.

 MfG
 Goswin

 --
 Caml-list mailing list.  Subscription management and archives:
 https://sympa-roc.inria.fr/wws/info/caml-list
 Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
 Bug reports: http://caml.inria.fr/bin/caml-bugs





-- 
Gerd Stolpmann, Darmstadt, Germanyg...@gerd-stolpmann.de
Creator of GODI and camlcity.org.
Contact details:http://www.camlcity.org/contact.html
Company homepage:   http://www.gerd-stolpmann.de
*** Searching for new projects! Need consulting for system
*** programming in Ocaml? Gerd Stolpmann can help you.



-- 
Caml-list mailing list.  Subscription management and archives:
https://sympa-roc.inria.fr/wws/info/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs



Re: [Caml-list] Hashtbl and security

2012-02-08 Thread Goswin von Brederlow
Gerd Stolpmann i...@gerd-stolpmann.de writes:

 Gerd Stolpmann i...@gerd-stolpmann.de writes:
 You are forgetting a variable in this. To create a DOS in the hashtable
 you need to hit the same bucket again and again. And bucket = hash % size.

 You focused on the hash but lets talk about the size for a moment.

 The size is rather limited and large hashtabels simply won't increate
 size anymore and allways have lots of collisions. So even if someone
 isn't actively trying to create DOS the performace still tanks as you
 add more items.

 I cannot completely follow here. If you add more elements, the bucket
 array is resized. The ocaml Hashtbl does this when the number of elements
 exceeds half of the array size. The argument of Hashtbl.create is only the
 initial size of the bucket array.

 The upper bound is Sys.max_array_length, of course. Granted, on 32 bit
 platforms it is low (4M). But nobody of sound mind would run ocaml
 programs on 32 bit for production purposes. (If you do, consider to
 switch. You can use more memory and get a little performance boost, not
 only for ocaml programs.)

 Gerd

In practice I see a serious performance degradation with large
hashtables. So much so that I had to use an array of hashtables or
hashtable of hashtables and split the values across them.

Maybe the size of the hashtable indeed isn't the problem but the hash
function. Here is something odd:

# for i = 1 to 63 do Printf.printf %d %d\n (1 lsl i) (Hashtbl.hash (1 lsl i)) 
done;;
2 2
4 4
8 8
16 16
32 32
64 64
128 128
256 256
512 512
1024 1024
2048 2048
4096 4096
8192 8192
16384 16384
32768 32768
65536 65536
131072 131072
262144 262144
524288 524288
1048576 1048576
2097152 2097152
4194304 4194304
8388608 8388608
16777216 16777216
33554432 33554432
67108864 67108864
134217728 134217728
268435456 268435456
536870912 536870912
1073741824 0
2147483648 0
4294967296 0
8589934592 0
17179869184 0
34359738368 0
68719476736 0
137438953472 0
274877906944 0
549755813888 0
1099511627776 0
219902322 0
4398046511104 0
8796093022208 0
17592186044416 0
35184372088832 0
70368744177664 0
140737488355328 0
281474976710656 0
562949953421312 0
1125899906842624 0
2251799813685248 0
4503599627370496 0
9007199254740992 0
18014398509481984 0
36028797018963968 0
72057594037927936 0
144115188075855872 0
288230376151711744 0
576460752303423488 0
1152921504606846976 0
2305843009213693952 0
-4611686018427387904 0
0 0
- : unit = ()

Is every value over a billion hashed to 0?

MfG
Goswin

-- 
Caml-list mailing list.  Subscription management and archives:
https://sympa-roc.inria.fr/wws/info/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs



Re: [Caml-list] Hashtbl and security

2012-02-08 Thread AUGER Cédric
Le Wed, 08 Feb 2012 10:41:03 +0100,
Goswin von Brederlow goswin-...@web.de a écrit :

 Gerd Stolpmann i...@gerd-stolpmann.de writes:
 
  Gerd Stolpmann i...@gerd-stolpmann.de writes:
  You are forgetting a variable in this. To create a DOS in the
  hashtable you need to hit the same bucket again and again. And
  bucket = hash % size.
 
  You focused on the hash but lets talk about the size for a moment.
 
  The size is rather limited and large hashtabels simply won't
  increate size anymore and allways have lots of collisions. So even
  if someone isn't actively trying to create DOS the performace
  still tanks as you add more items.
 
  I cannot completely follow here. If you add more elements, the
  bucket array is resized. The ocaml Hashtbl does this when the
  number of elements exceeds half of the array size. The argument of
  Hashtbl.create is only the initial size of the bucket array.
 
  The upper bound is Sys.max_array_length, of course. Granted, on 32
  bit platforms it is low (4M). But nobody of sound mind would run
  ocaml programs on 32 bit for production purposes. (If you do,
  consider to switch. You can use more memory and get a little
  performance boost, not only for ocaml programs.)
 
  Gerd
 
 In practice I see a serious performance degradation with large
 hashtables. So much so that I had to use an array of hashtables or
 hashtable of hashtables and split the values across them.
 
 Maybe the size of the hashtable indeed isn't the problem but the hash
 function. Here is something odd:
 
 # for i = 1 to 63 do Printf.printf %d %d\n (1 lsl i) (Hashtbl.hash
 (1 lsl i)) done;; 2 2
 4 4
 8 8
 16 16
 32 32
 64 64
 128 128
 256 256
 512 512
 1024 1024
 2048 2048
 4096 4096
 8192 8192
 16384 16384
 32768 32768
 65536 65536
 131072 131072
 262144 262144
 524288 524288
 1048576 1048576
 2097152 2097152
 4194304 4194304
 8388608 8388608
 16777216 16777216
 33554432 33554432
 67108864 67108864
 134217728 134217728
 268435456 268435456
 536870912 536870912
 1073741824 0
 2147483648 0
 4294967296 0
 8589934592 0
 17179869184 0
 34359738368 0
 68719476736 0
 137438953472 0
 274877906944 0
 549755813888 0
 1099511627776 0
 219902322 0
 4398046511104 0
 8796093022208 0
 17592186044416 0
 35184372088832 0
 70368744177664 0
 140737488355328 0
 281474976710656 0
 562949953421312 0
 1125899906842624 0
 2251799813685248 0
 4503599627370496 0
 9007199254740992 0
 18014398509481984 0
 36028797018963968 0
 72057594037927936 0
 144115188075855872 0
 288230376151711744 0
 576460752303423488 0
 1152921504606846976 0
 2305843009213693952 0
 -4611686018427387904 0
 0 0
 - : unit = ()
 
 Is every value over a billion hashed to 0?
 
 MfG
 Goswin
 

Go to:
http://caml.inria.fr/cgi-bin/viewvc.cgi/ocaml/trunk/byterun/hash.c?view=markup
line 285

if you want another hasfunction, you can download the sources, modify
it and recompile.

More seriously, hashtables are here to store values, so you don't want
want to have a hash value bigger than (2^30)-1.

In fact even such a value is too big for that purpouse IMHO, since on a
4Gio RAM computer, it will take half it to store the table, assuming
each entry only takes 4 octets, that is it would be ok only for storing
just the structure of the table (each entry has to be a pointer),
so that in practice, either your table is ridiculously big to store
too few values either it is relevant to have such a size and in this
case your computer will spend all its time to swap since you won't
have enough RAM.

 Clearly you cannot a bijection from 63 bits
integers to 31 bits integers; so there are a lot of collisions. I do
not have a 64 arch, but I guesse that if you hash 2^48+168 you will
get 168. I guess such a function is ways to be ideal, since when
playing with bitvectors having two integers which differs by only one
bit is very common, but at least it is a fast function, and has a good
repartition (you have 2^32 collisions exactly per bucket).


-- 
Caml-list mailing list.  Subscription management and archives:
https://sympa-roc.inria.fr/wws/info/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs



Re: [Caml-list] Hashtbl and security

2012-02-08 Thread Gerd Stolpmann
Well, looks like that the upper 32 bits aren't considered at all for
computing the hash value. This is for sure surprising - I would have
expected that these bits somehow go in to the final value (e.g. that they
are xor-ed with the lower half).

My guess is that this was done to ensure that an int-to-something
hashtable can be marshalled from 64 to 32 bit systems. Not sure for what
this is important, but it looks like a nice symmetry.

In trunk the method for ensuring this seems to have changed (see function
caml_hash_mix_intnat in
http://caml.inria.fr/cgi-bin/viewvc.cgi/ocaml/trunk/byterun/hash.c?revision=12072view=markup).

Gerd

 Gerd Stolpmann i...@gerd-stolpmann.de writes:

 Gerd Stolpmann i...@gerd-stolpmann.de writes:
 You are forgetting a variable in this. To create a DOS in the hashtable
 you need to hit the same bucket again and again. And bucket = hash %
 size.

 You focused on the hash but lets talk about the size for a moment.

 The size is rather limited and large hashtabels simply won't increate
 size anymore and allways have lots of collisions. So even if someone
 isn't actively trying to create DOS the performace still tanks as you
 add more items.

 I cannot completely follow here. If you add more elements, the bucket
 array is resized. The ocaml Hashtbl does this when the number of
 elements
 exceeds half of the array size. The argument of Hashtbl.create is only
 the
 initial size of the bucket array.

 The upper bound is Sys.max_array_length, of course. Granted, on 32 bit
 platforms it is low (4M). But nobody of sound mind would run ocaml
 programs on 32 bit for production purposes. (If you do, consider to
 switch. You can use more memory and get a little performance boost, not
 only for ocaml programs.)

 Gerd

 In practice I see a serious performance degradation with large
 hashtables. So much so that I had to use an array of hashtables or
 hashtable of hashtables and split the values across them.

 Maybe the size of the hashtable indeed isn't the problem but the hash
 function. Here is something odd:

 # for i = 1 to 63 do Printf.printf %d %d\n (1 lsl i) (Hashtbl.hash (1
 lsl i)) done;;
 2 2
 4 4
 8 8
 16 16
 32 32
 64 64
 128 128
 256 256
 512 512
 1024 1024
 2048 2048
 4096 4096
 8192 8192
 16384 16384
 32768 32768
 65536 65536
 131072 131072
 262144 262144
 524288 524288
 1048576 1048576
 2097152 2097152
 4194304 4194304
 8388608 8388608
 16777216 16777216
 33554432 33554432
 67108864 67108864
 134217728 134217728
 268435456 268435456
 536870912 536870912
 1073741824 0
 2147483648 0
 4294967296 0
 8589934592 0
 17179869184 0
 34359738368 0
 68719476736 0
 137438953472 0
 274877906944 0
 549755813888 0
 1099511627776 0
 219902322 0
 4398046511104 0
 8796093022208 0
 17592186044416 0
 35184372088832 0
 70368744177664 0
 140737488355328 0
 281474976710656 0
 562949953421312 0
 1125899906842624 0
 2251799813685248 0
 4503599627370496 0
 9007199254740992 0
 18014398509481984 0
 36028797018963968 0
 72057594037927936 0
 144115188075855872 0
 288230376151711744 0
 576460752303423488 0
 1152921504606846976 0
 2305843009213693952 0
 -4611686018427387904 0
 0 0
 - : unit = ()

 Is every value over a billion hashed to 0?

 MfG
 Goswin

 --
 Caml-list mailing list.  Subscription management and archives:
 https://sympa-roc.inria.fr/wws/info/caml-list
 Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
 Bug reports: http://caml.inria.fr/bin/caml-bugs





-- 
Gerd Stolpmann, Darmstadt, Germanyg...@gerd-stolpmann.de
Creator of GODI and camlcity.org.
Contact details:http://www.camlcity.org/contact.html
Company homepage:   http://www.gerd-stolpmann.de
*** Searching for new projects! Need consulting for system
*** programming in Ocaml? Gerd Stolpmann can help you.



-- 
Caml-list mailing list.  Subscription management and archives:
https://sympa-roc.inria.fr/wws/info/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs



Re: [Caml-list] Hashtbl and security

2012-01-31 Thread Gerd Stolpmann

 Gerd Stolpmann i...@gerd-stolpmann.de writes:

 Hi,

 there was recently a security alert for web services that use hash
 tables to store web form parameters sent via POST (so that millions of
 such parameters can be sent in a single request). It is possible to keep
 the web service busy for hours with such a DoS (denial of service)
 attack. The type of attack boils down to a problem in most hash table
 implementations, namely that the hash functions are invertible, and it
 is possible for a malicious user to construct lots of keys that all map
 to the same bucket of the hash table, creating a mass collision.

 The text of the alert:
 http://www.nruns.com/_downloads/advisory28122011.pdf

 I'd like to discuss this issue, because it is not restricted to the
 processing of web requests, but may also occur for all other data coming
 from untrusted sources. The web is only the most exposed area where this
 issue exists.

 So how is Ocaml affected? The hash functions used in recent Ocaml
 releases are also insecure in the above mentioned sense (currently
 MurmurHash3, and even a simpler hash function in previous releases). A
 quick survey of the Internet revealed at least one site that tries to
 break it. Probably a good cryptographer could do it in minutes.

 A pure Hashtbl.add of the constructed keys does not yet lead to the
 performance degradation, but a Hashtbl.replace, and of course
 Hashtbl.find after the table is built up will. So it depends very much
 of the details of the programs whether they are affected or not.

 I've just checked that Ocamlnet uses only Hashtbl.add to collect POST
 parameters, so it is not directly vulnerable. But if the crafted request
 is actually served by a handler, the handler would get a degraded table,
 and could show in turn bad performance (again leading to DoS).

 What are possible fixes?

 1) Avoid hash tables in contexts where security is relevant. The
 alternative is Set (actually a balanced binary tree), which does not
 show this problem.

 Unfortunately O(log n)  O(1) and that can be a deciding factor in the
 overall runtime. Even for small n your code then runs 2,3,4,... times
 slower.

 2) Use cryptographically secure hash functions.

 3) Use randomized hash tables. The trick here is that there is not a
 single hash function h anymore, but a family h(1)...h(n). When the hash
 table is created, one of the functions is picked randomly. This makes it
 impossible to craft an attack request, because you cannot predict the
 function.

 I don't think 1) is viable - hash tables are too wide spread, and are
 loved by programmers because of their good performance. 2) would be good
 in extremely critical contexts - although it is then again questionable,
 because it is likely to have worse performance than 1).

 So, the question is how to do 3). I see two problems here:

 a) how to define the family of hash functions. Is it e.g. sufficient to
 introduce an initialization vector for the Murmurhash algorithm, and
 fill it randomly? How to get a random number that is good enough?

 b) the Hashtbl in the standard library does not allow it to set the hash
 function dynamically. Maybe one can get this effect by using first-class
 modules (haven't checked).

 Anyway, I think these problems are difficult enough to deserve some
 discussion here. At least I cannot solve them immediately, and this
 problem may exist for lots of Ocaml applications.

 Gerd

 You are forgetting a variable in this. To create a DOS in the hashtable
 you need to hit the same bucket again and again. And bucket = hash % size.

 You focused on the hash but lets talk about the size for a moment.

 The size is rather limited and large hashtabels simply won't increate
 size anymore and allways have lots of collisions. So even if someone
 isn't actively trying to create DOS the performace still tanks as you
 add more items.

I cannot completely follow here. If you add more elements, the bucket
array is resized. The ocaml Hashtbl does this when the number of elements
exceeds half of the array size. The argument of Hashtbl.create is only the
initial size of the bucket array.

The upper bound is Sys.max_array_length, of course. Granted, on 32 bit
platforms it is low (4M). But nobody of sound mind would run ocaml
programs on 32 bit for production purposes. (If you do, consider to
switch. You can use more memory and get a little performance boost, not
only for ocaml programs.)

Gerd

 And this isn't only a problem in ocaml. The glib hashtable also has a
 maximum size in the 32bit range for example.


 So for ocaml the first thing that needs to be fixed is to allow larger
 size arrays of buckets.

 MfG
 Goswin

 --
 Caml-list mailing list.  Subscription management and archives:
 https://sympa-roc.inria.fr/wws/info/caml-list
 Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
 Bug reports: http://caml.inria.fr/bin/caml-bugs





-- 
Gerd Stolpmann, Darmstadt, Germanyg...@gerd-stolpmann.de
Creator 

Re: [Caml-list] Hashtbl and security

2012-01-30 Thread Goswin von Brederlow
Gerd Stolpmann i...@gerd-stolpmann.de writes:

 Hi,

 there was recently a security alert for web services that use hash
 tables to store web form parameters sent via POST (so that millions of
 such parameters can be sent in a single request). It is possible to keep
 the web service busy for hours with such a DoS (denial of service)
 attack. The type of attack boils down to a problem in most hash table
 implementations, namely that the hash functions are invertible, and it
 is possible for a malicious user to construct lots of keys that all map
 to the same bucket of the hash table, creating a mass collision.

 The text of the alert: 
 http://www.nruns.com/_downloads/advisory28122011.pdf

 I'd like to discuss this issue, because it is not restricted to the
 processing of web requests, but may also occur for all other data coming
 from untrusted sources. The web is only the most exposed area where this
 issue exists.

 So how is Ocaml affected? The hash functions used in recent Ocaml
 releases are also insecure in the above mentioned sense (currently
 MurmurHash3, and even a simpler hash function in previous releases). A
 quick survey of the Internet revealed at least one site that tries to
 break it. Probably a good cryptographer could do it in minutes. 

 A pure Hashtbl.add of the constructed keys does not yet lead to the
 performance degradation, but a Hashtbl.replace, and of course
 Hashtbl.find after the table is built up will. So it depends very much
 of the details of the programs whether they are affected or not.

 I've just checked that Ocamlnet uses only Hashtbl.add to collect POST
 parameters, so it is not directly vulnerable. But if the crafted request
 is actually served by a handler, the handler would get a degraded table,
 and could show in turn bad performance (again leading to DoS).

 What are possible fixes?

 1) Avoid hash tables in contexts where security is relevant. The
 alternative is Set (actually a balanced binary tree), which does not
 show this problem.

Unfortunately O(log n)  O(1) and that can be a deciding factor in the
overall runtime. Even for small n your code then runs 2,3,4,... times
slower.

 2) Use cryptographically secure hash functions.

 3) Use randomized hash tables. The trick here is that there is not a
 single hash function h anymore, but a family h(1)...h(n). When the hash
 table is created, one of the functions is picked randomly. This makes it
 impossible to craft an attack request, because you cannot predict the
 function.

 I don't think 1) is viable - hash tables are too wide spread, and are
 loved by programmers because of their good performance. 2) would be good
 in extremely critical contexts - although it is then again questionable,
 because it is likely to have worse performance than 1).

 So, the question is how to do 3). I see two problems here:

 a) how to define the family of hash functions. Is it e.g. sufficient to
 introduce an initialization vector for the Murmurhash algorithm, and
 fill it randomly? How to get a random number that is good enough?

 b) the Hashtbl in the standard library does not allow it to set the hash
 function dynamically. Maybe one can get this effect by using first-class
 modules (haven't checked).

 Anyway, I think these problems are difficult enough to deserve some
 discussion here. At least I cannot solve them immediately, and this
 problem may exist for lots of Ocaml applications.

 Gerd

You are forgetting a variable in this. To create a DOS in the hashtable
you need to hit the same bucket again and again. And bucket = hash % size.

You focused on the hash but lets talk about the size for a moment.

The size is rather limited and large hashtabels simply won't increate
size anymore and allways have lots of collisions. So even if someone
isn't actively trying to create DOS the performace still tanks as you
add more items.

And this isn't only a problem in ocaml. The glib hashtable also has a
maximum size in the 32bit range for example.


So for ocaml the first thing that needs to be fixed is to allow larger
size arrays of buckets.

MfG
Goswin

-- 
Caml-list mailing list.  Subscription management and archives:
https://sympa-roc.inria.fr/wws/info/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs



Re: [Caml-list] Hashtbl and security

2012-01-30 Thread Goswin von Brederlow
Xavier Leroy xavier.le...@inria.fr writes:

 On 01/01/2012 01:52 PM, Richard W.M. Jones wrote:
 On Fri, Dec 30, 2011 at 06:06:26PM +0100, Xavier Leroy wrote:
 Indeed.  The optional seed parameter to Hashtbl.create does exactly
 this in the new implementation of Hashtbl (the one based on Murmur3).
 
 It may be worth noting that Perl solved this problem (back in 2003) by
 unconditionally using a seed which is a global set to a random number
 during interpreter initialization.  

 That's how my initial reimplementation of Hashtbl worked, using the
 Random module to produce seeds, but I was told (correctly) that in
 security-sensitive applications it's better to leave the generation of
 random numbers under control of the programmer.  For some applications
 Random.self_init might be good enough and for others stronger
 randomness is needed.

 Of course, you can trivially emulate Perl's behavior using the new
 Hashtbl implementation + the Random module.

 - Xavier Leroy

It is also crucial if you are doing performance tests or debugging. You
want the same behaviour on every run for that.

MfG
Goswin

-- 
Caml-list mailing list.  Subscription management and archives:
https://sympa-roc.inria.fr/wws/info/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs



Re: [Caml-list] Hashtbl and security

2012-01-04 Thread Damien Doligez
On 2012-01-02, at 02:43, oliver wrote:

 If the type is an abstract type, which comes from something like
 Hashtbl.Randomseed
 and has type t, not type int, this problem would vanish.

You have to be careful.  If we make hash table randomization mandatory,
the Frama-C people will hate us, as will all the people who want
reproducible results from their programs (for purposes of testing and
benchmarking, for example).

So, even if randomized is the default, there must be a way to get a
plain hash table that does the same thing every time.

-- Damien


-- 
Caml-list mailing list.  Subscription management and archives:
https://sympa-roc.inria.fr/wws/info/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs



Re: [Caml-list] Hashtbl and security

2012-01-04 Thread oliver
On Wed, Jan 04, 2012 at 06:56:11PM +0100, Damien Doligez wrote:
 On 2012-01-02, at 02:43, oliver wrote:
 
  If the type is an abstract type, which comes from something like
  Hashtbl.Randomseed
  and has type t, not type int, this problem would vanish.
 
 You have to be careful.  If we make hash table randomization mandatory,
 the Frama-C people will hate us, as will all the people who want
 reproducible results from their programs (for purposes of testing and
 benchmarking, for example).
[...]

I did not meant it must be mandatory.
But provide a way, that makes it easy to use randomization
and not-so-easy to use the always-same values e.g. for testing puroposes.

If it needs extra effort to make simple seed values, people would prefer the 
randomized ones,
if not they want to write some extra code (maybe applying a functor).


 
 So, even if randomized is the default, there must be a way to get a
 plain hash table that does the same thing every time.

Yes, of course.

But maybe it should not be encouraged, and the programmer-in-a-hurry would
use ready-to-use random initializations, which are provided by the 
Hashtbl-module
and the one who needs it non-randomized would need to write his own addition 
then.

Then the lazy programmer goes safe and the unsafe way needs extra effort.

Nevertheless I think optional int value as a first fix would also be ok.


And maybe some of you remember the Debian random-device bug (some years ago),
where the random-device under certain circumstances ran out of entropy

So in any case it needs to be possible to change the random generator.

But pseudo-random is always a compromise.
Who really needs true random should of course use special hardware that
creates wide bandwith noise and uses an ADC to sample the signal.

Ciao,
   Oliver

-- 
Caml-list mailing list.  Subscription management and archives:
https://sympa-roc.inria.fr/wws/info/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs



Re: [Caml-list] Hashtbl and security

2012-01-01 Thread Richard W.M. Jones
On Fri, Dec 30, 2011 at 06:06:26PM +0100, Xavier Leroy wrote:
 Indeed.  The optional seed parameter to Hashtbl.create does exactly
 this in the new implementation of Hashtbl (the one based on Murmur3).

It may be worth noting that Perl solved this problem (back in 2003) by
unconditionally using a seed which is a global set to a random number
during interpreter initialization.  Each run of the interpreter
results in a different seed, and (it is supposed therefore) users of
Perl simply don't need to worry about algorithmic complexity attacks.

http://perl5.git.perl.org/perl.git/blob/HEAD:/hv.h#l104
http://dev.perl.org/perl5/news/2003/perl-5.8.1.html#Hash_Randomisation

Rich.

-- 
Richard Jones
Red Hat

-- 
Caml-list mailing list.  Subscription management and archives:
https://sympa-roc.inria.fr/wws/info/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs



Re: [Caml-list] Hashtbl and security

2012-01-01 Thread Gerd Stolpmann
Am Sonntag, den 01.01.2012, 18:29 +0100 schrieb Xavier Leroy:
 On 01/01/2012 01:52 PM, Richard W.M. Jones wrote:
  On Fri, Dec 30, 2011 at 06:06:26PM +0100, Xavier Leroy wrote:
  Indeed.  The optional seed parameter to Hashtbl.create does exactly
  this in the new implementation of Hashtbl (the one based on Murmur3).
  
  It may be worth noting that Perl solved this problem (back in 2003) by
  unconditionally using a seed which is a global set to a random number
  during interpreter initialization.  
 
 That's how my initial reimplementation of Hashtbl worked, using the
 Random module to produce seeds, but I was told (correctly) that in
 security-sensitive applications it's better to leave the generation of
 random numbers under control of the programmer.  For some applications
 Random.self_init might be good enough and for others stronger
 randomness is needed.
 
 Of course, you can trivially emulate Perl's behavior using the new
 Hashtbl implementation + the Random module.

I understand it very well that adding support for cryptographically
secure random numbers to core Ocaml is a challenge. There is no POSIX
API, and /dev/random is, although widely available, still non-standard.
And it is certainly true that there are various levels of security, and
for some purposes the programmer should be free to install even better
generators. Nevertheless, Ocaml is now widely used in environments where
a certain minimum of security is demanded, and I think Ocaml should
provide this minimum at least, and use it for things like an
automatically chosen seed for hash tables.

My argument is: Even providing a half solution is in this area better
than leaving the unwary programmer completely alone. Because in the
latter case, nothing will be done to address the problems, and apps
would be easier to attack.

What I could imagine is a module Sys.Security where all security
features are accessible and configurable, e.g.

val set_default_seed : int - unit
val fill_randomly : [`Fast|`Safe] - string - unit
val set_fill_randomly : ([`Fast|`Safe] - string - unit) - unit

The configure script could check what is available on the system. The
`Fast mode is guaranteed to always exist, and would essentially
be /dev/urandom if available and otherwise Random-based. The `Safe mode
would require /dev/random (or comparable), and fail otherwise. (See
Wikipedia http://en.wikipedia.org/wiki/dev/random for a list of OS, and
for weaknesses.)

The user can override the RNG if the defaults are not ok, or if exact
control is required.

Of course, one could criticize such an interface in various ways. For
example, it only allows a global security profile, and not different
profiles for different parts of the program. Also, the distinction
between a fast and a safe generator sounds a bit arbitrary. My response
is simply: this is the price for the broad effect, and IMHO this counts
more than having a more elaborated but also more difficult solution, or
lots of user-specific solutions (as we would have to find today).

Gerd
-- 

Gerd Stolpmann, Darmstadt, Germanyg...@gerd-stolpmann.de
Creator of GODI and camlcity.org.
Contact details:http://www.camlcity.org/contact.html
Company homepage:   http://www.gerd-stolpmann.de



-- 
Caml-list mailing list.  Subscription management and archives:
https://sympa-roc.inria.fr/wws/info/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs



Re: [Caml-list] Hashtbl and security

2012-01-01 Thread oliver
On Sun, Jan 01, 2012 at 10:04:03PM +0100, Gerd Stolpmann wrote:
 Am Sonntag, den 01.01.2012, 18:29 +0100 schrieb Xavier Leroy:
  On 01/01/2012 01:52 PM, Richard W.M. Jones wrote:
   On Fri, Dec 30, 2011 at 06:06:26PM +0100, Xavier Leroy wrote:
   Indeed.  The optional seed parameter to Hashtbl.create does exactly
   this in the new implementation of Hashtbl (the one based on Murmur3).
   
   It may be worth noting that Perl solved this problem (back in 2003) by
   unconditionally using a seed which is a global set to a random number
   during interpreter initialization.  
  
  That's how my initial reimplementation of Hashtbl worked, using the
  Random module to produce seeds, but I was told (correctly) that in
  security-sensitive applications it's better to leave the generation of
  random numbers under control of the programmer.  For some applications
  Random.self_init might be good enough and for others stronger
  randomness is needed.
  
  Of course, you can trivially emulate Perl's behavior using the new
  Hashtbl implementation + the Random module.
 
 I understand it very well that adding support for cryptographically
 secure random numbers to core Ocaml is a challenge. There is no POSIX
 API, and /dev/random is, although widely available, still non-standard.
[...]

And also might not be good enough for some certain areas.

val Hashtbl.HashedType.hash: t - int

allows at least providing your own hashing.function,
but that function then must be sophisticated enough
to provide some dynamic re-seeding.
Not sure if this does not rather conflict with referential transparency?!
Under the hood such a function of course good do some reading from
a random source... but that looks dirty to me.

So such a function with optional seed-parameters (as mentioned by Xavier leroy) 
might
make sense; when using the imperative features from my understanding it seems
to be much easier to address that hash-collision problem.


 And it is certainly true that there are various levels of security, and
 for some purposes the programmer should be free to install even better
 generators. Nevertheless, Ocaml is now widely used in environments where
 a certain minimum of security is demanded, and I think Ocaml should
 provide this minimum at least, and use it for things like an
 automatically chosen seed for hash tables.

That's already planned and even implemented, as was mentioned in this thread.
So urging for a new official release would make sense.

 
 My argument is: Even providing a half solution is in this area better
 than leaving the unwary programmer completely alone. Because in the
 latter case, nothing will be done to address the problems, and apps
 would be easier to attack.

Maybe a half solution is, what already is done.
I doubt that hash collisions are a new topic,
so I wonder why such things were not already implemented.
Only that this might be used in attacks seems to be new.


 
 What I could imagine is a module Sys.Security where all security
 features are accessible and configurable, e.g.

I doubt that this makes sense.
Nearly anything that can be programmed can become a security
issue, if done wrong; especially things done on the
operating system level (like Unix module) could become
a security issue, if things are handled without care.

A mandatory (not optional) hash-function-parameter
that must be passed (plus some default hash functions
with elaborated documentation on the properties of those)
would make more sense to me.

Putting things that need tp be part of the Hash-module
aside into a Sys.security-module makes these things less
apparent to the programmer who just wants to use hashes,
and don't thinks about using hashes might be a security problem.

So, this solution IMHO would be counterproductive
and non-intuitive.


Ciao,
   Oliver

-- 
Caml-list mailing list.  Subscription management and archives:
https://sympa-roc.inria.fr/wws/info/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs



[Caml-list] Hashtbl and security

2011-12-30 Thread Gerd Stolpmann
Hi,

there was recently a security alert for web services that use hash
tables to store web form parameters sent via POST (so that millions of
such parameters can be sent in a single request). It is possible to keep
the web service busy for hours with such a DoS (denial of service)
attack. The type of attack boils down to a problem in most hash table
implementations, namely that the hash functions are invertible, and it
is possible for a malicious user to construct lots of keys that all map
to the same bucket of the hash table, creating a mass collision.

The text of the alert: 
http://www.nruns.com/_downloads/advisory28122011.pdf

I'd like to discuss this issue, because it is not restricted to the
processing of web requests, but may also occur for all other data coming
from untrusted sources. The web is only the most exposed area where this
issue exists.

So how is Ocaml affected? The hash functions used in recent Ocaml
releases are also insecure in the above mentioned sense (currently
MurmurHash3, and even a simpler hash function in previous releases). A
quick survey of the Internet revealed at least one site that tries to
break it. Probably a good cryptographer could do it in minutes. 

A pure Hashtbl.add of the constructed keys does not yet lead to the
performance degradation, but a Hashtbl.replace, and of course
Hashtbl.find after the table is built up will. So it depends very much
of the details of the programs whether they are affected or not.

I've just checked that Ocamlnet uses only Hashtbl.add to collect POST
parameters, so it is not directly vulnerable. But if the crafted request
is actually served by a handler, the handler would get a degraded table,
and could show in turn bad performance (again leading to DoS).

What are possible fixes?

1) Avoid hash tables in contexts where security is relevant. The
alternative is Set (actually a balanced binary tree), which does not
show this problem.

2) Use cryptographically secure hash functions.

3) Use randomized hash tables. The trick here is that there is not a
single hash function h anymore, but a family h(1)...h(n). When the hash
table is created, one of the functions is picked randomly. This makes it
impossible to craft an attack request, because you cannot predict the
function.

I don't think 1) is viable - hash tables are too wide spread, and are
loved by programmers because of their good performance. 2) would be good
in extremely critical contexts - although it is then again questionable,
because it is likely to have worse performance than 1).

So, the question is how to do 3). I see two problems here:

a) how to define the family of hash functions. Is it e.g. sufficient to
introduce an initialization vector for the Murmurhash algorithm, and
fill it randomly? How to get a random number that is good enough?

b) the Hashtbl in the standard library does not allow it to set the hash
function dynamically. Maybe one can get this effect by using first-class
modules (haven't checked).

Anyway, I think these problems are difficult enough to deserve some
discussion here. At least I cannot solve them immediately, and this
problem may exist for lots of Ocaml applications.

Gerd
-- 

Gerd Stolpmann, Darmstadt, Germanyg...@gerd-stolpmann.de
Creator of GODI and camlcity.org.
Contact details:http://www.camlcity.org/contact.html
Company homepage:   http://www.gerd-stolpmann.de



-- 
Caml-list mailing list.  Subscription management and archives:
https://sympa-roc.inria.fr/wws/info/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs



Re: [Caml-list] Hashtbl and security

2011-12-30 Thread Yaron Minsky
For just this reason, the hashtables in Core have been reimplemented to use
an AVL tree in the buckets.  That way, even when you have pathological
collisions, you degrade gracefully to O(log n) per operation, instead of
O(n), where n is the number of keys in the hashtable.

y

On Fri, Dec 30, 2011 at 11:44 AM, Gerd Stolpmann i...@gerd-stolpmann.dewrote:

 Hi,

 there was recently a security alert for web services that use hash
 tables to store web form parameters sent via POST (so that millions of
 such parameters can be sent in a single request). It is possible to keep
 the web service busy for hours with such a DoS (denial of service)
 attack. The type of attack boils down to a problem in most hash table
 implementations, namely that the hash functions are invertible, and it
 is possible for a malicious user to construct lots of keys that all map
 to the same bucket of the hash table, creating a mass collision.

 The text of the alert:
 http://www.nruns.com/_downloads/advisory28122011.pdf

 I'd like to discuss this issue, because it is not restricted to the
 processing of web requests, but may also occur for all other data coming
 from untrusted sources. The web is only the most exposed area where this
 issue exists.

 So how is Ocaml affected? The hash functions used in recent Ocaml
 releases are also insecure in the above mentioned sense (currently
 MurmurHash3, and even a simpler hash function in previous releases). A
 quick survey of the Internet revealed at least one site that tries to
 break it. Probably a good cryptographer could do it in minutes.

 A pure Hashtbl.add of the constructed keys does not yet lead to the
 performance degradation, but a Hashtbl.replace, and of course
 Hashtbl.find after the table is built up will. So it depends very much
 of the details of the programs whether they are affected or not.

 I've just checked that Ocamlnet uses only Hashtbl.add to collect POST
 parameters, so it is not directly vulnerable. But if the crafted request
 is actually served by a handler, the handler would get a degraded table,
 and could show in turn bad performance (again leading to DoS).

 What are possible fixes?

 1) Avoid hash tables in contexts where security is relevant. The
 alternative is Set (actually a balanced binary tree), which does not
 show this problem.

 2) Use cryptographically secure hash functions.

 3) Use randomized hash tables. The trick here is that there is not a
 single hash function h anymore, but a family h(1)...h(n). When the hash
 table is created, one of the functions is picked randomly. This makes it
 impossible to craft an attack request, because you cannot predict the
 function.

 I don't think 1) is viable - hash tables are too wide spread, and are
 loved by programmers because of their good performance. 2) would be good
 in extremely critical contexts - although it is then again questionable,
 because it is likely to have worse performance than 1).

 So, the question is how to do 3). I see two problems here:

 a) how to define the family of hash functions. Is it e.g. sufficient to
 introduce an initialization vector for the Murmurhash algorithm, and
 fill it randomly? How to get a random number that is good enough?

 b) the Hashtbl in the standard library does not allow it to set the hash
 function dynamically. Maybe one can get this effect by using first-class
 modules (haven't checked).

 Anyway, I think these problems are difficult enough to deserve some
 discussion here. At least I cannot solve them immediately, and this
 problem may exist for lots of Ocaml applications.

 Gerd
 --
 
 Gerd Stolpmann, Darmstadt, Germanyg...@gerd-stolpmann.de
 Creator of GODI and camlcity.org.
 Contact details:http://www.camlcity.org/contact.html
 Company homepage:   http://www.gerd-stolpmann.de
 


 --
 Caml-list mailing list.  Subscription management and archives:
 https://sympa-roc.inria.fr/wws/info/caml-list
 Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
 Bug reports: http://caml.inria.fr/bin/caml-bugs



-- 
Caml-list mailing list.  Subscription management and archives:
https://sympa-roc.inria.fr/wws/info/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs



Re: [Caml-list] Hashtbl and security

2011-12-30 Thread Xavier Leroy
On 12/30/2011 05:44 PM, Gerd Stolpmann wrote:

 1) Avoid hash tables in contexts where security is relevant. The
 alternative is Set (actually a balanced binary tree), which does not
 show this problem.

Highly recommended.  Nothing beats guaranteed O(log n) operations.

 2) Use cryptographically secure hash functions.

Hopeless: with a hash size of 30 bits, as in Caml, or even 64 bits,
there are no cryptographically secure hash functions.

 3) Use randomized hash tables. The trick here is that there is not a
 single hash function h anymore, but a family h(1)...h(n). When the hash
 table is created, one of the functions is picked randomly. This makes it
 impossible to craft an attack request, because you cannot predict the
 function. 

Indeed.  The optional seed parameter to Hashtbl.create does exactly
this in the new implementation of Hashtbl (the one based on Murmur3).

 So, the question is how to do 3). I see two problems here:
 
 a) how to define the family of hash functions. Is it e.g. sufficient to
 introduce an initialization vector for the Murmurhash algorithm, and
 fill it randomly?

IIRC, the Web pages for the Murmur family of hashes gives some
statistical evidence that this approach works.

 How to get a random number that is good enough?

Hmm.  /dev/random is your friend on the platforms that support it.
Otherwise, there's always the Random module, but Random.self_init
isn't very strong.

- Xavier Leroy

-- 
Caml-list mailing list.  Subscription management and archives:
https://sympa-roc.inria.fr/wws/info/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs



Re: [Caml-list] Hashtbl and security

2011-12-30 Thread Edgar Friendly
Not a problem, other than memory waste (which doesn't matter the key).
Hashtbl.add prepends to the front of the list, so it'll be fast.  And
Hashtbl.find will match the head of the list, and return quickly too.
Hashtbl.find_all will have to go through the whole list, but that's
expected.

E.

On Fri, Dec 30, 2011 at 12:40 PM, ri...@happyleptic.org wrote:

 I will probably tell something very stupid, but HTML specs
 do not prevent a client to post 1M values with the same name,
 so whatever your hash function you cannot do much, can you?

 The simplest solution I can think of that prevents all attacks
 of this kind (but could reject some valid POST in theory) would
 be to store the bucket lengths and use it to detect and reject
 obviously biaised insertions.


 --
 Caml-list mailing list.  Subscription management and archives:
 https://sympa-roc.inria.fr/wws/info/caml-list
 Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
 Bug reports: http://caml.inria.fr/bin/caml-bugs



-- 
Caml-list mailing list.  Subscription management and archives:
https://sympa-roc.inria.fr/wws/info/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs



RE: [Caml-list] Hashtbl and security

2011-12-30 Thread David Allsopp
Yaron Minsky wrote:
 For just this reason, the hashtables in Core have been reimplemented to use an
 AVL tree in the buckets.  That way, even when you have pathological 
 collisions,
 you degrade gracefully to O(log n) per operation, instead of O(n), where n is
 the number of keys in the hashtable.

I'm resisting the temptation to hack-it-and-see: does your implementation do 
anything clever to maintain Hashtbl's O(1) insertion time (e.g. Hashtbl.add 
updates a list and then the first call to Hashtbl.find or Hashtbl.mem moves 
any items from the list to the AVL). Or does doing that impact general 
performance too much?

In the POST web scenario (or processing HTTP request headers), for example, 
degrading Hashtbl.add from O(1) to O(log n) is only acceptable if you know 
that you'll query all the fields in the POST (which isn't necessarily true).


David


-- 
Caml-list mailing list.  Subscription management and archives:
https://sympa-roc.inria.fr/wws/info/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs



Re: [Caml-list] Hashtbl and security

2011-12-30 Thread Yaron Minsky
It's not clever in that way.  It does try to do a good job of keeping the
memory impact of the tree low, but you maintain O(1) by having a low load
factor, and therefore trees of constant size.  You can take a look at the
code here:

https://bitbucket.org/yminsky/ocaml-core/src/8e757d8f7309/base/core/lib/core_hashtbl.ml

(Don't rely on that repo too much yet, btw.  We're probably going to blow
it away and create a new one in the next couple of days.  But going
forward, we plan on using bitbucket as a place to work together with the
community on Core.)

y

On Fri, Dec 30, 2011 at 2:01 PM, David Allsopp dra-n...@metastack.comwrote:

 Yaron Minsky wrote:
  For just this reason, the hashtables in Core have been reimplemented to
 use an
  AVL tree in the buckets.  That way, even when you have pathological
 collisions,
  you degrade gracefully to O(log n) per operation, instead of O(n), where
 n is
  the number of keys in the hashtable.

 I'm resisting the temptation to hack-it-and-see: does your implementation
 do anything clever to maintain Hashtbl's O(1) insertion time (e.g.
 Hashtbl.add updates a list and then the first call to Hashtbl.find or
 Hashtbl.mem moves any items from the list to the AVL). Or does doing that
 impact general performance too much?

 In the POST web scenario (or processing HTTP request headers), for
 example, degrading Hashtbl.add from O(1) to O(log n) is only acceptable
 if you know that you'll query all the fields in the POST (which isn't
 necessarily true).


 David


-- 
Caml-list mailing list.  Subscription management and archives:
https://sympa-roc.inria.fr/wws/info/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs



Re: [Caml-list] Hashtbl and security

2011-12-30 Thread Gerd Stolpmann
Am Freitag, den 30.12.2011, 18:06 +0100 schrieb Xavier Leroy:
  3) Use randomized hash tables. The trick here is that there is not a
  single hash function h anymore, but a family h(1)...h(n). When the hash
  table is created, one of the functions is picked randomly. This makes it
  impossible to craft an attack request, because you cannot predict the
  function. 
 
 Indeed.  The optional seed parameter to Hashtbl.create does exactly
 this in the new implementation of Hashtbl (the one based on Murmur3).

I see. It will be available in 3.13:

val create : ?seed:int - int - ('a, 'b) t

There is also an additional functorized interface where this seed
argument exists (Hashtbl.MakeSeeded), and the hash functions seeded_hash
and seeded_hash_param. Well done!

Nevertheless, as we all don't know when 3.13 is ready, I'll have to find
a temporary fix for Ocamlnet. Maybe just a limit for the number of POST
parameters.

  So, the question is how to do 3). I see two problems here:
  
  a) how to define the family of hash functions. Is it e.g. sufficient to
  introduce an initialization vector for the Murmurhash algorithm, and
  fill it randomly?
 
 IIRC, the Web pages for the Murmur family of hashes gives some
 statistical evidence that this approach works.
 
  How to get a random number that is good enough?
 
 Hmm.  /dev/random is your friend on the platforms that support it.
 Otherwise, there's always the Random module, but Random.self_init
 isn't very strong.

Well, /dev/(u)random covers most Unix platforms nowadays. If you are
interested, I have a wrapper for Win32:

https://godirepo.camlcity.org/svn/lib-ocamlnet2/trunk/code/src/netsys/netsys_c_win32.c

Scroll down until netsys_fill_random.

Gerd
-- 

Gerd Stolpmann, Darmstadt, Germanyg...@gerd-stolpmann.de
Creator of GODI and camlcity.org.
Contact details:http://www.camlcity.org/contact.html
Company homepage:   http://www.gerd-stolpmann.de



-- 
Caml-list mailing list.  Subscription management and archives:
https://sympa-roc.inria.fr/wws/info/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs



Re: [Caml-list] Hashtbl and security

2011-12-30 Thread oliver
On Fri, Dec 30, 2011 at 06:06:26PM +0100, Xavier Leroy wrote:
 On 12/30/2011 05:44 PM, Gerd Stolpmann wrote:
 
  1) Avoid hash tables in contexts where security is relevant. The
  alternative is Set (actually a balanced binary tree), which does not
  show this problem.
 
 Highly recommended.  Nothing beats guaranteed O(log n) operations.
 
  2) Use cryptographically secure hash functions.
 
 Hopeless: with a hash size of 30 bits, as in Caml, or even 64 bits,
 there are no cryptographically secure hash functions.
 
  3) Use randomized hash tables. The trick here is that there is not a
  single hash function h anymore, but a family h(1)...h(n). When the hash
  table is created, one of the functions is picked randomly. This makes it
  impossible to craft an attack request, because you cannot predict the
  function. 
 
 Indeed.  The optional seed parameter to Hashtbl.create does exactly
 this in the new implementation of Hashtbl (the one based on Murmur3).
[...]


Where is this?

I found  Hashtbl.HashedType.hash  with type t - int.

But I did not found an optional parameter to Hashtbl.create

Maybe that is part of the next release?

Ciao,
   Oliver

-- 
Caml-list mailing list.  Subscription management and archives:
https://sympa-roc.inria.fr/wws/info/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs



Re: [Caml-list] Hashtbl and security

2011-12-30 Thread oliver
On Sat, Dec 31, 2011 at 01:57:21AM +0100, oliver wrote:
 On Fri, Dec 30, 2011 at 06:06:26PM +0100, Xavier Leroy wrote:
  On 12/30/2011 05:44 PM, Gerd Stolpmann wrote:
  
   1) Avoid hash tables in contexts where security is relevant. The
   alternative is Set (actually a balanced binary tree), which does not
   show this problem.
  
  Highly recommended.  Nothing beats guaranteed O(log n) operations.
  
   2) Use cryptographically secure hash functions.
  
  Hopeless: with a hash size of 30 bits, as in Caml, or even 64 bits,
  there are no cryptographically secure hash functions.
  
   3) Use randomized hash tables. The trick here is that there is not a
   single hash function h anymore, but a family h(1)...h(n). When the hash
   table is created, one of the functions is picked randomly. This makes it
   impossible to craft an attack request, because you cannot predict the
   function. 
  
  Indeed.  The optional seed parameter to Hashtbl.create does exactly
  this in the new implementation of Hashtbl (the one based on Murmur3).
 [...]
 
 
 Where is this?
 
 I found  Hashtbl.HashedType.hash  with type t - int.

And there is val hash_param : int - int - 'a - int
but I'm not sure if this adresses the issue.

Ciao,
   Oliver

-- 
Caml-list mailing list.  Subscription management and archives:
https://sympa-roc.inria.fr/wws/info/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs



Re: [Caml-list] Hashtbl and security

2011-12-30 Thread oliver
On Fri, Dec 30, 2011 at 06:40:30PM +0100, ri...@happyleptic.org wrote:
 I will probably tell something very stupid, but HTML specs
 do not prevent a client to post 1M values with the same name,
 so whatever your hash function you cannot do much, can you?
[...]

That's a feature.


 
 The simplest solution I can think of that prevents all attacks
 of this kind (but could reject some valid POST in theory) would
 be to store the bucket lengths and use it to detect and reject
 obviously biaised insertions.
[...]

How do you define obvious and biased?

Sometimes, the distinction between feature and bug
depends on the context...

Ciao,
   Oliver

-- 
Caml-list mailing list.  Subscription management and archives:
https://sympa-roc.inria.fr/wws/info/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs