Re: [Caml-list] Hashtbl and security
> AUGER Cédric writes: > >> More seriously, hashtables are here to store values, so you don't want >> want to have a hash value bigger than (2^30)-1. >> >> In fact even such a value is too big for that purpouse IMHO, since on a >> 4Gio RAM computer, it will take half it to store the table, assuming >> each entry only takes 4 octets, that is it would be ok only for storing >> just the structure of the table (each entry has to be a pointer), >> so that in practice, either your table is ridiculously big to store >> too few values either it is relevant to have such a size and in this >> case your computer will spend all its time to swap since you won't >> have enough RAM. > > But I have 128GB ram. :) That allows for a hashtable of size 2^33 > entries pointing at simple values (e.g. ints) without swapping. And ram > sizes get bigger and bigger. Fully agreed. The limitation to 30 bits is already unpractical for such large computers. Maybe we need a second hash function val Hashtbl.wide_hash : 'a -> int without this constraint (on 64 bit systems). Btw, you can already define your own hash functions, just use the functorized interface. And you are not limited to 30 bits. It's a bit inconvenient only. Gerd > > [Doesn't a Hashtbl of ints store the ints directly? That would allow > 2^34 entries.] > >> Clearly you cannot a bijection from 63 bits >> integers to 31 bits integers; so there are a lot of collisions. I do >> not have a 64 arch, but I guesse that if you hash "2^48+168" you will >> get "168". I guess such a function is ways to be ideal, since when >> playing with bitvectors having two integers which differs by only one >> bit is very common, but at least it is a fast function, and has a good >> repartition (you have 2^32 collisions exactly per bucket). > > Ignoring the upper bits makes for a verry verry poor hash function. Also > means it is 100% trivial to create a DOS by tuneing your input values to > only differ in the upper bits. > > MfG > Goswin > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa-roc.inria.fr/wws/info/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > > > -- Gerd Stolpmann, Darmstadt, Germanyg...@gerd-stolpmann.de Creator of GODI and camlcity.org. Contact details:http://www.camlcity.org/contact.html Company homepage: http://www.gerd-stolpmann.de *** Searching for new projects! Need consulting for system *** programming in Ocaml? Gerd Stolpmann can help you. -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Hashtbl and security
AUGER Cédric writes: > More seriously, hashtables are here to store values, so you don't want > want to have a hash value bigger than (2^30)-1. > > In fact even such a value is too big for that purpouse IMHO, since on a > 4Gio RAM computer, it will take half it to store the table, assuming > each entry only takes 4 octets, that is it would be ok only for storing > just the structure of the table (each entry has to be a pointer), > so that in practice, either your table is ridiculously big to store > too few values either it is relevant to have such a size and in this > case your computer will spend all its time to swap since you won't > have enough RAM. But I have 128GB ram. :) That allows for a hashtable of size 2^33 entries pointing at simple values (e.g. ints) without swapping. And ram sizes get bigger and bigger. [Doesn't a Hashtbl of ints store the ints directly? That would allow 2^34 entries.] > Clearly you cannot a bijection from 63 bits > integers to 31 bits integers; so there are a lot of collisions. I do > not have a 64 arch, but I guesse that if you hash "2^48+168" you will > get "168". I guess such a function is ways to be ideal, since when > playing with bitvectors having two integers which differs by only one > bit is very common, but at least it is a fast function, and has a good > repartition (you have 2^32 collisions exactly per bucket). Ignoring the upper bits makes for a verry verry poor hash function. Also means it is 100% trivial to create a DOS by tuneing your input values to only differ in the upper bits. MfG Goswin -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Hashtbl and security
"Gerd Stolpmann" writes: > Well, looks like that the upper 32 bits aren't considered at all for > computing the hash value. This is for sure surprising - I would have > expected that these bits somehow go in to the final value (e.g. that they > are xor-ed with the lower half). > > My guess is that this was done to ensure that an int-to-something > hashtable can be marshalled from 64 to 32 bit systems. Not sure for what > this is important, but it looks like a nice symmetry. Marshaling hashtables of ints might be an argument. BUT: # Hashtbl.hash (1 lsr 40);; - : int = 0 # Hashtbl.hash (Int64.of_int (1 lsr 40));; - : int = 0 # Hashtbl.hash (Int64.of_int (1 lsr 40 + 1));; - : int = 1 > In trunk the method for ensuring this seems to have changed (see function > caml_hash_mix_intnat in > http://caml.inria.fr/cgi-bin/viewvc.cgi/ocaml/trunk/byterun/hash.c?revision=12072&view=markup). > > Gerd Lets hope that fixes this. MfG Goswin -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Hashtbl and security
Well, looks like that the upper 32 bits aren't considered at all for computing the hash value. This is for sure surprising - I would have expected that these bits somehow go in to the final value (e.g. that they are xor-ed with the lower half). My guess is that this was done to ensure that an int-to-something hashtable can be marshalled from 64 to 32 bit systems. Not sure for what this is important, but it looks like a nice symmetry. In trunk the method for ensuring this seems to have changed (see function caml_hash_mix_intnat in http://caml.inria.fr/cgi-bin/viewvc.cgi/ocaml/trunk/byterun/hash.c?revision=12072&view=markup). Gerd > "Gerd Stolpmann" writes: > >>> Gerd Stolpmann writes: >>> You are forgetting a variable in this. To create a DOS in the hashtable >>> you need to hit the same bucket again and again. And bucket = hash % >>> size. >>> >>> You focused on the hash but lets talk about the size for a moment. >>> >>> The size is rather limited and large hashtabels simply won't increate >>> size anymore and allways have lots of collisions. So even if someone >>> isn't actively trying to create DOS the performace still tanks as you >>> add more items. >> >> I cannot completely follow here. If you add more elements, the bucket >> array is resized. The ocaml Hashtbl does this when the number of >> elements >> exceeds half of the array size. The argument of Hashtbl.create is only >> the >> initial size of the bucket array. >> >> The upper bound is Sys.max_array_length, of course. Granted, on 32 bit >> platforms it is low (4M). But nobody of sound mind would run ocaml >> programs on 32 bit for production purposes. (If you do, consider to >> switch. You can use more memory and get a little performance boost, not >> only for ocaml programs.) >> >> Gerd > > In practice I see a serious performance degradation with large > hashtables. So much so that I had to use an array of hashtables or > hashtable of hashtables and split the values across them. > > Maybe the size of the hashtable indeed isn't the problem but the hash > function. Here is something odd: > > # for i = 1 to 63 do Printf.printf "%d %d\n" (1 lsl i) (Hashtbl.hash (1 > lsl i)) done;; > 2 2 > 4 4 > 8 8 > 16 16 > 32 32 > 64 64 > 128 128 > 256 256 > 512 512 > 1024 1024 > 2048 2048 > 4096 4096 > 8192 8192 > 16384 16384 > 32768 32768 > 65536 65536 > 131072 131072 > 262144 262144 > 524288 524288 > 1048576 1048576 > 2097152 2097152 > 4194304 4194304 > 8388608 8388608 > 16777216 16777216 > 33554432 33554432 > 67108864 67108864 > 134217728 134217728 > 268435456 268435456 > 536870912 536870912 > 1073741824 0 > 2147483648 0 > 4294967296 0 > 8589934592 0 > 17179869184 0 > 34359738368 0 > 68719476736 0 > 137438953472 0 > 274877906944 0 > 549755813888 0 > 1099511627776 0 > 219902322 0 > 4398046511104 0 > 8796093022208 0 > 17592186044416 0 > 35184372088832 0 > 70368744177664 0 > 140737488355328 0 > 281474976710656 0 > 562949953421312 0 > 1125899906842624 0 > 2251799813685248 0 > 4503599627370496 0 > 9007199254740992 0 > 18014398509481984 0 > 36028797018963968 0 > 72057594037927936 0 > 144115188075855872 0 > 288230376151711744 0 > 576460752303423488 0 > 1152921504606846976 0 > 2305843009213693952 0 > -4611686018427387904 0 > 0 0 > - : unit = () > > Is every value over a billion hashed to 0? > > MfG > Goswin > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa-roc.inria.fr/wws/info/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > > > -- Gerd Stolpmann, Darmstadt, Germanyg...@gerd-stolpmann.de Creator of GODI and camlcity.org. Contact details:http://www.camlcity.org/contact.html Company homepage: http://www.gerd-stolpmann.de *** Searching for new projects! Need consulting for system *** programming in Ocaml? Gerd Stolpmann can help you. -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Hashtbl and security
Le Wed, 08 Feb 2012 10:41:03 +0100, Goswin von Brederlow a écrit : > "Gerd Stolpmann" writes: > > >> Gerd Stolpmann writes: > >> You are forgetting a variable in this. To create a DOS in the > >> hashtable you need to hit the same bucket again and again. And > >> bucket = hash % size. > >> > >> You focused on the hash but lets talk about the size for a moment. > >> > >> The size is rather limited and large hashtabels simply won't > >> increate size anymore and allways have lots of collisions. So even > >> if someone isn't actively trying to create DOS the performace > >> still tanks as you add more items. > > > > I cannot completely follow here. If you add more elements, the > > bucket array is resized. The ocaml Hashtbl does this when the > > number of elements exceeds half of the array size. The argument of > > Hashtbl.create is only the initial size of the bucket array. > > > > The upper bound is Sys.max_array_length, of course. Granted, on 32 > > bit platforms it is low (4M). But nobody of sound mind would run > > ocaml programs on 32 bit for production purposes. (If you do, > > consider to switch. You can use more memory and get a little > > performance boost, not only for ocaml programs.) > > > > Gerd > > In practice I see a serious performance degradation with large > hashtables. So much so that I had to use an array of hashtables or > hashtable of hashtables and split the values across them. > > Maybe the size of the hashtable indeed isn't the problem but the hash > function. Here is something odd: > > # for i = 1 to 63 do Printf.printf "%d %d\n" (1 lsl i) (Hashtbl.hash > (1 lsl i)) done;; 2 2 > 4 4 > 8 8 > 16 16 > 32 32 > 64 64 > 128 128 > 256 256 > 512 512 > 1024 1024 > 2048 2048 > 4096 4096 > 8192 8192 > 16384 16384 > 32768 32768 > 65536 65536 > 131072 131072 > 262144 262144 > 524288 524288 > 1048576 1048576 > 2097152 2097152 > 4194304 4194304 > 8388608 8388608 > 16777216 16777216 > 33554432 33554432 > 67108864 67108864 > 134217728 134217728 > 268435456 268435456 > 536870912 536870912 > 1073741824 0 > 2147483648 0 > 4294967296 0 > 8589934592 0 > 17179869184 0 > 34359738368 0 > 68719476736 0 > 137438953472 0 > 274877906944 0 > 549755813888 0 > 1099511627776 0 > 219902322 0 > 4398046511104 0 > 8796093022208 0 > 17592186044416 0 > 35184372088832 0 > 70368744177664 0 > 140737488355328 0 > 281474976710656 0 > 562949953421312 0 > 1125899906842624 0 > 2251799813685248 0 > 4503599627370496 0 > 9007199254740992 0 > 18014398509481984 0 > 36028797018963968 0 > 72057594037927936 0 > 144115188075855872 0 > 288230376151711744 0 > 576460752303423488 0 > 1152921504606846976 0 > 2305843009213693952 0 > -4611686018427387904 0 > 0 0 > - : unit = () > > Is every value over a billion hashed to 0? > > MfG > Goswin > Go to: http://caml.inria.fr/cgi-bin/viewvc.cgi/ocaml/trunk/byterun/hash.c?view=markup line 285 if you want another hasfunction, you can download the sources, modify it and recompile. More seriously, hashtables are here to store values, so you don't want want to have a hash value bigger than (2^30)-1. In fact even such a value is too big for that purpouse IMHO, since on a 4Gio RAM computer, it will take half it to store the table, assuming each entry only takes 4 octets, that is it would be ok only for storing just the structure of the table (each entry has to be a pointer), so that in practice, either your table is ridiculously big to store too few values either it is relevant to have such a size and in this case your computer will spend all its time to swap since you won't have enough RAM. Clearly you cannot a bijection from 63 bits integers to 31 bits integers; so there are a lot of collisions. I do not have a 64 arch, but I guesse that if you hash "2^48+168" you will get "168". I guess such a function is ways to be ideal, since when playing with bitvectors having two integers which differs by only one bit is very common, but at least it is a fast function, and has a good repartition (you have 2^32 collisions exactly per bucket). -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Hashtbl and security
On Wed, Feb 8, 2012 at 10:41 AM, Goswin von Brederlow wrote: > [...] > Maybe the size of the hashtable indeed isn't the problem but the hash > function. Here is something odd: > > # for i = 1 to 63 do Printf.printf "%d %d\n" (1 lsl i) (Hashtbl.hash (1 lsl > i)) done;; > 2 2 > 4 4 > 8 8 > 16 16 > 32 32 > 64 64 > 128 128 > 256 256 > 512 512 > 1024 1024 > 2048 2048 > 4096 4096 > 8192 8192 > 16384 16384 > 32768 32768 > 65536 65536 > 131072 131072 > 262144 262144 > 524288 524288 > 1048576 1048576 > 2097152 2097152 > 4194304 4194304 > 8388608 8388608 > 16777216 16777216 > 33554432 33554432 > 67108864 67108864 > 134217728 134217728 > 268435456 268435456 > 536870912 536870912 > 1073741824 0 > 2147483648 0 > 4294967296 0 > 8589934592 0 > 17179869184 0 > 34359738368 0 > 68719476736 0 > 137438953472 0 > 274877906944 0 > 549755813888 0 > 1099511627776 0 > 219902322 0 > 4398046511104 0 > 8796093022208 0 > 17592186044416 0 > 35184372088832 0 > 70368744177664 0 > 140737488355328 0 > 281474976710656 0 > 562949953421312 0 > 1125899906842624 0 > 2251799813685248 0 > 4503599627370496 0 > 9007199254740992 0 > 18014398509481984 0 > 36028797018963968 0 > 72057594037927936 0 > 144115188075855872 0 > 288230376151711744 0 > 576460752303423488 0 > 1152921504606846976 0 > 2305843009213693952 0 > -4611686018427387904 0 > 0 0 > - : unit = () > > Is every value over a billion hashed to 0? > > MfG > Goswin > Hi, you're making it look worse than it really is: $ ocaml Objective Caml version 3.12.0 # Hashtbl.hash 11;; - : int = 11 # Hashtbl.hash 1073741824;; - : int = 0 # Hashtbl.hash 1073741825;; - : int = 1 # Hashtbl.hash 1073741826;; - : int = 2 # Hashtbl.hash 1073741829;; - : int = 5 # Hashtbl.hash 1073741830;; - : int = 6 # Hashtbl.hash 9007199254740994;; - : int = 2 # Hashtbl.hash 9007199254740990;; - : int = 1073741822 Cheers, -- Philippe Wang m...@philippewang.info -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Hashtbl and security
"Gerd Stolpmann" writes: >> Gerd Stolpmann writes: >> You are forgetting a variable in this. To create a DOS in the hashtable >> you need to hit the same bucket again and again. And bucket = hash % size. >> >> You focused on the hash but lets talk about the size for a moment. >> >> The size is rather limited and large hashtabels simply won't increate >> size anymore and allways have lots of collisions. So even if someone >> isn't actively trying to create DOS the performace still tanks as you >> add more items. > > I cannot completely follow here. If you add more elements, the bucket > array is resized. The ocaml Hashtbl does this when the number of elements > exceeds half of the array size. The argument of Hashtbl.create is only the > initial size of the bucket array. > > The upper bound is Sys.max_array_length, of course. Granted, on 32 bit > platforms it is low (4M). But nobody of sound mind would run ocaml > programs on 32 bit for production purposes. (If you do, consider to > switch. You can use more memory and get a little performance boost, not > only for ocaml programs.) > > Gerd In practice I see a serious performance degradation with large hashtables. So much so that I had to use an array of hashtables or hashtable of hashtables and split the values across them. Maybe the size of the hashtable indeed isn't the problem but the hash function. Here is something odd: # for i = 1 to 63 do Printf.printf "%d %d\n" (1 lsl i) (Hashtbl.hash (1 lsl i)) done;; 2 2 4 4 8 8 16 16 32 32 64 64 128 128 256 256 512 512 1024 1024 2048 2048 4096 4096 8192 8192 16384 16384 32768 32768 65536 65536 131072 131072 262144 262144 524288 524288 1048576 1048576 2097152 2097152 4194304 4194304 8388608 8388608 16777216 16777216 33554432 33554432 67108864 67108864 134217728 134217728 268435456 268435456 536870912 536870912 1073741824 0 2147483648 0 4294967296 0 8589934592 0 17179869184 0 34359738368 0 68719476736 0 137438953472 0 274877906944 0 549755813888 0 1099511627776 0 219902322 0 4398046511104 0 8796093022208 0 17592186044416 0 35184372088832 0 70368744177664 0 140737488355328 0 281474976710656 0 562949953421312 0 1125899906842624 0 2251799813685248 0 4503599627370496 0 9007199254740992 0 18014398509481984 0 36028797018963968 0 72057594037927936 0 144115188075855872 0 288230376151711744 0 576460752303423488 0 1152921504606846976 0 2305843009213693952 0 -4611686018427387904 0 0 0 - : unit = () Is every value over a billion hashed to 0? MfG Goswin -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Hashtbl and security
> Gerd Stolpmann writes: > >> Hi, >> >> there was recently a security alert for web services that use hash >> tables to store web form parameters sent via POST (so that millions of >> such parameters can be sent in a single request). It is possible to keep >> the web service busy for hours with such a DoS (denial of service) >> attack. The type of attack boils down to a problem in most hash table >> implementations, namely that the hash functions are invertible, and it >> is possible for a malicious user to construct lots of keys that all map >> to the same bucket of the hash table, creating a mass collision. >> >> The text of the alert: >> http://www.nruns.com/_downloads/advisory28122011.pdf >> >> I'd like to discuss this issue, because it is not restricted to the >> processing of web requests, but may also occur for all other data coming >> from untrusted sources. The web is only the most exposed area where this >> issue exists. >> >> So how is Ocaml affected? The hash functions used in recent Ocaml >> releases are also insecure in the above mentioned sense (currently >> MurmurHash3, and even a simpler hash function in previous releases). A >> quick survey of the Internet revealed at least one site that tries to >> break it. Probably a good cryptographer could do it in minutes. >> >> A pure Hashtbl.add of the constructed keys does not yet lead to the >> performance degradation, but a Hashtbl.replace, and of course >> Hashtbl.find after the table is built up will. So it depends very much >> of the details of the programs whether they are affected or not. >> >> I've just checked that Ocamlnet uses only Hashtbl.add to collect POST >> parameters, so it is not directly vulnerable. But if the crafted request >> is actually served by a handler, the handler would get a degraded table, >> and could show in turn bad performance (again leading to DoS). >> >> What are possible fixes? >> >> 1) Avoid hash tables in contexts where security is relevant. The >> alternative is Set (actually a balanced binary tree), which does not >> show this problem. > > Unfortunately O(log n) > O(1) and that can be a deciding factor in the > overall runtime. Even for small n your code then runs 2,3,4,... times > slower. > >> 2) Use cryptographically secure hash functions. >> >> 3) Use "randomized" hash tables. The trick here is that there is not a >> single hash function h anymore, but a family h(1)...h(n). When the hash >> table is created, one of the functions is picked randomly. This makes it >> impossible to craft an attack request, because you cannot predict the >> function. >> >> I don't think 1) is viable - hash tables are too wide spread, and are >> loved by programmers because of their good performance. 2) would be good >> in extremely critical contexts - although it is then again questionable, >> because it is likely to have worse performance than 1). >> >> So, the question is how to do 3). I see two problems here: >> >> a) how to define the family of hash functions. Is it e.g. sufficient to >> introduce an initialization vector for the Murmurhash algorithm, and >> fill it randomly? How to get a random number that is good enough? >> >> b) the Hashtbl in the standard library does not allow it to set the hash >> function dynamically. Maybe one can get this effect by using first-class >> modules (haven't checked). >> >> Anyway, I think these problems are difficult enough to deserve some >> discussion here. At least I cannot solve them immediately, and this >> problem may exist for lots of Ocaml applications. >> >> Gerd > > You are forgetting a variable in this. To create a DOS in the hashtable > you need to hit the same bucket again and again. And bucket = hash % size. > > You focused on the hash but lets talk about the size for a moment. > > The size is rather limited and large hashtabels simply won't increate > size anymore and allways have lots of collisions. So even if someone > isn't actively trying to create DOS the performace still tanks as you > add more items. I cannot completely follow here. If you add more elements, the bucket array is resized. The ocaml Hashtbl does this when the number of elements exceeds half of the array size. The argument of Hashtbl.create is only the initial size of the bucket array. The upper bound is Sys.max_array_length, of course. Granted, on 32 bit platforms it is low (4M). But nobody of sound mind would run ocaml programs on 32 bit for production purposes. (If you do, consider to switch. You can use more memory and get a little performance boost, not only for ocaml programs.) Gerd > And this isn't only a problem in ocaml. The glib hashtable also has a > maximum size in the 32bit range for example. > > > So for ocaml the first thing that needs to be fixed is to allow larger > size arrays of buckets. > > MfG > Goswin > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa-roc.inria.fr/wws/info/caml-list > Beginner's list: http://groups.yahoo.com/g
Re: [Caml-list] Hashtbl and security
Xavier Leroy writes: > On 01/01/2012 01:52 PM, Richard W.M. Jones wrote: >> On Fri, Dec 30, 2011 at 06:06:26PM +0100, Xavier Leroy wrote: >>> Indeed. The optional "seed" parameter to Hashtbl.create does exactly >>> this in the new implementation of Hashtbl (the one based on Murmur3). >> >> It may be worth noting that Perl solved this problem (back in 2003) by >> unconditionally using a seed which is a global set to a random number >> during interpreter initialization. > > That's how my initial reimplementation of Hashtbl worked, using the > Random module to produce seeds, but I was told (correctly) that in > security-sensitive applications it's better to leave the generation of > random numbers under control of the programmer. For some applications > Random.self_init might be good enough and for others stronger > randomness is needed. > > Of course, you can trivially emulate Perl's behavior using the new > Hashtbl implementation + the Random module. > > - Xavier Leroy It is also crucial if you are doing performance tests or debugging. You want the same behaviour on every run for that. MfG Goswin -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Hashtbl and security
Gerd Stolpmann writes: > Hi, > > there was recently a security alert for web services that use hash > tables to store web form parameters sent via POST (so that millions of > such parameters can be sent in a single request). It is possible to keep > the web service busy for hours with such a DoS (denial of service) > attack. The type of attack boils down to a problem in most hash table > implementations, namely that the hash functions are invertible, and it > is possible for a malicious user to construct lots of keys that all map > to the same bucket of the hash table, creating a mass collision. > > The text of the alert: > http://www.nruns.com/_downloads/advisory28122011.pdf > > I'd like to discuss this issue, because it is not restricted to the > processing of web requests, but may also occur for all other data coming > from untrusted sources. The web is only the most exposed area where this > issue exists. > > So how is Ocaml affected? The hash functions used in recent Ocaml > releases are also insecure in the above mentioned sense (currently > MurmurHash3, and even a simpler hash function in previous releases). A > quick survey of the Internet revealed at least one site that tries to > break it. Probably a good cryptographer could do it in minutes. > > A pure Hashtbl.add of the constructed keys does not yet lead to the > performance degradation, but a Hashtbl.replace, and of course > Hashtbl.find after the table is built up will. So it depends very much > of the details of the programs whether they are affected or not. > > I've just checked that Ocamlnet uses only Hashtbl.add to collect POST > parameters, so it is not directly vulnerable. But if the crafted request > is actually served by a handler, the handler would get a degraded table, > and could show in turn bad performance (again leading to DoS). > > What are possible fixes? > > 1) Avoid hash tables in contexts where security is relevant. The > alternative is Set (actually a balanced binary tree), which does not > show this problem. Unfortunately O(log n) > O(1) and that can be a deciding factor in the overall runtime. Even for small n your code then runs 2,3,4,... times slower. > 2) Use cryptographically secure hash functions. > > 3) Use "randomized" hash tables. The trick here is that there is not a > single hash function h anymore, but a family h(1)...h(n). When the hash > table is created, one of the functions is picked randomly. This makes it > impossible to craft an attack request, because you cannot predict the > function. > > I don't think 1) is viable - hash tables are too wide spread, and are > loved by programmers because of their good performance. 2) would be good > in extremely critical contexts - although it is then again questionable, > because it is likely to have worse performance than 1). > > So, the question is how to do 3). I see two problems here: > > a) how to define the family of hash functions. Is it e.g. sufficient to > introduce an initialization vector for the Murmurhash algorithm, and > fill it randomly? How to get a random number that is good enough? > > b) the Hashtbl in the standard library does not allow it to set the hash > function dynamically. Maybe one can get this effect by using first-class > modules (haven't checked). > > Anyway, I think these problems are difficult enough to deserve some > discussion here. At least I cannot solve them immediately, and this > problem may exist for lots of Ocaml applications. > > Gerd You are forgetting a variable in this. To create a DOS in the hashtable you need to hit the same bucket again and again. And bucket = hash % size. You focused on the hash but lets talk about the size for a moment. The size is rather limited and large hashtabels simply won't increate size anymore and allways have lots of collisions. So even if someone isn't actively trying to create DOS the performace still tanks as you add more items. And this isn't only a problem in ocaml. The glib hashtable also has a maximum size in the 32bit range for example. So for ocaml the first thing that needs to be fixed is to allow larger size arrays of buckets. MfG Goswin -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Hashtbl and security
On Wed, Jan 04, 2012 at 06:56:11PM +0100, Damien Doligez wrote: > On 2012-01-02, at 02:43, oliver wrote: > > > If the type is an abstract type, which comes from something like > > Hashtbl.Randomseed > > and has type t, not type int, this problem would vanish. > > You have to be careful. If we make hash table randomization mandatory, > the Frama-C people will hate us, as will all the people who want > reproducible results from their programs (for purposes of testing and > benchmarking, for example). [...] I did not meant it must be mandatory. But provide a way, that makes it easy to use randomization and not-so-easy to use the always-same values e.g. for testing puroposes. If it needs extra effort to make simple seed values, people would prefer the randomized ones, if not they want to write some extra code (maybe applying a functor). > > So, even if randomized is the default, there must be a way to get a > plain hash table that does the same thing every time. Yes, of course. But maybe it should not be encouraged, and the programmer-in-a-hurry would use ready-to-use random initializations, which are provided by the Hashtbl-module and the one who needs it non-randomized would need to write his own addition then. Then the lazy programmer goes safe and the unsafe way needs extra effort. Nevertheless I think optional int value as a first fix would also be ok. And maybe some of you remember the Debian random-device bug (some years ago), where the random-device under certain circumstances ran out of entropy So in any case it needs to be possible to change the random generator. But pseudo-random is always a compromise. Who really needs true random should of course use special hardware that creates wide bandwith noise and uses an ADC to sample the signal. Ciao, Oliver -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Hashtbl and security
On 2012-01-02, at 02:43, oliver wrote: > If the type is an abstract type, which comes from something like > Hashtbl.Randomseed > and has type t, not type int, this problem would vanish. You have to be careful. If we make hash table randomization mandatory, the Frama-C people will hate us, as will all the people who want reproducible results from their programs (for purposes of testing and benchmarking, for example). So, even if randomized is the default, there must be a way to get a plain hash table that does the same thing every time. -- Damien -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Hashtbl and security
Am Sonntag, den 01.01.2012, 16:21 -0800 schrieb Shawn Wagner: > On Fri, 30 Dec 2011 17:44:06 +0100 > Gerd Stolpmann wrote: > > > > What are possible fixes? > > > > 1) Avoid hash tables in contexts where security is relevant. The > > alternative is Set (actually a balanced binary tree), which does not > > show this problem. > > > > 2) Use cryptographically secure hash functions. > > > > 3) Use "randomized" hash tables. The trick here is that there is not a > > single hash function h anymore, but a family h(1)...h(n). When the > > hash table is created, one of the functions is picked randomly. This > > makes it impossible to craft an attack request, because you cannot > > predict the function. > > > > There's also an option 4 that's barely been mentioned in any > discussion of this issue I've seen: Use a hash table implementation that > handles collisions in another way than having each bucket be a linked > list. Double hashing and cuckoo hashing come to mind, where an attacker > would have to find keys that map to the same value for not one, but two > or more different hash functions. When I overlook this correctly, this is just a trick to use more bits. So when you double-hash with two functions that deliver 30 bits each you'll effectively have the same security as 60 bits - provided these functions are really independent of each other. The choice of the functions is crucial, IMHO. I have no idea how you would choose them so that when you crack one of these, you don't know anything about the other. Of course, you could use a secure hash function like SHA-1 and take distinct ranges of bits, but I think this is more an implementation of 2), and not what you really mean. Is there any cryptanalysis of this approach? Gerd -- Gerd Stolpmann, Darmstadt, Germanyg...@gerd-stolpmann.de Creator of GODI and camlcity.org. Contact details:http://www.camlcity.org/contact.html Company homepage: http://www.gerd-stolpmann.de -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Hashtbl and security
Hello, 2012/1/1 Gerd Stolpmann : > Am Sonntag, den 01.01.2012, 18:29 +0100 schrieb Xavier Leroy: >> On 01/01/2012 01:52 PM, Richard W.M. Jones wrote: [...] >> > It may be worth noting that Perl solved this problem (back in 2003) by >> > unconditionally using a seed which is a global set to a random number >> > during interpreter initialization. >> >> That's how my initial reimplementation of Hashtbl worked, using the >> Random module to produce seeds, but I was told (correctly) that in >> security-sensitive applications it's better to leave the generation of >> random numbers under control of the programmer. For some applications >> Random.self_init might be good enough and for others stronger >> randomness is needed. >> >> Of course, you can trivially emulate Perl's behavior using the new >> Hashtbl implementation + the Random module. [...] > Nevertheless, Ocaml is now widely used in environments where > a certain minimum of security is demanded, and I think Ocaml should > provide this minimum at least, and use it for things like an > automatically chosen seed for hash tables. I share Gerd's opinion that OCaml should provide a "reasonable default", even if this default my not be enough for applications that need a strong security. Another "solution" would be to flag this API as a potential security issue in the documentation and/or provide a compiler warning to emit a warning if Hashtbl is used without proper initialization. Best regards, david -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Hashtbl and security
On Mon, Jan 02, 2012 at 12:58:03AM +0100, Gerd Stolpmann wrote: [...] > > > What I could imagine is a module Sys.Security where all security > > > features are accessible and configurable, e.g. > > > > I doubt that this makes sense. > > Nearly anything that can be programmed can become a security > > issue, if done wrong; especially things done on the > > operating system level (like Unix module) could become > > a security issue, if things are handled without care. > > Thanks for your argumentative help - by being ignorant you prove my > thesis that typical programmers won't do anything by themselves. The > world is so much trouble, better put the head into the sand, and hope > the attacker won't pick you. (Well, sorry, maybe a bit too harsh, but I > hope you get my point that it is no excuse that also other security > issues may exist). [...] You completely did NOT see what I talked about. I did not say, this problem should be ignored. At the day when I heard from this bug I also was tempted to post to this list here; just someone was faster then me. What I meant was not to ignore the problenm, because there are other problems; I meant, that there is no reason to ship the functionality into Sys.Security, if it belongs to Hashtbl. As you said, it is necessary to point the attention of the programmer to crucial points, this means, that the Hashtbl-Initialization needs to be in the Hashtbl module. The docs for the Hashtbl are the place, where a programmer looks for information on Hashtbles of course, and not in some module in any other place. Especially if the progranmmer is not aware of the problem, it's a necessity to place the information at the place of the module that you want to use. This is good style of documentation and also is rather FPL like. Exporting the Seeds-functionality to some other module is very similar to using global variables in a C program. Put the things where they belong to. And Hashtbl-init belongs to Hashtbl-module. [...] > > A mandatory (not optional) hash-function-parameter > > that must be passed (plus some default hash functions > > with elaborated documentation on the properties of those) > > would make more sense to me. > > The seed is so far optional. Sure, requiring it mandatorily would ensure > it is passed, but I guess programmers would just become used to the > idiom "Hashtbl.create ~seed:0 ()". If the type is an abstract type, which comes from something like Hashtbl.Randomseed and has type t, not type int, this problem would vanish. [...] > > Putting things that need tp be part of the Hash-module > > aside into a Sys.security-module makes these things less > > apparent to the programmer who just wants to use hashes, > > and don't thinks about using hashes might be a security problem. > > If done right, this won't be a problem anymore for the typical user (who You also mentioned, that programmers might be too lazy or are not aware of the problem. Then putting the initialization to some unrelated module is not supporting the programmer to become aware of the problem. Even if the person already knows that there might be some security issues with the hashtables, then I doubt, that the first idea of most programmers would be to look out for something like Sys.Security. Instead I would assume higher frequency of asking mails on this list, about this problem. Ciao, Oliver -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Hashtbl and security
On Fri, 30 Dec 2011 17:44:06 +0100 Gerd Stolpmann wrote: > > What are possible fixes? > > 1) Avoid hash tables in contexts where security is relevant. The > alternative is Set (actually a balanced binary tree), which does not > show this problem. > > 2) Use cryptographically secure hash functions. > > 3) Use "randomized" hash tables. The trick here is that there is not a > single hash function h anymore, but a family h(1)...h(n). When the > hash table is created, one of the functions is picked randomly. This > makes it impossible to craft an attack request, because you cannot > predict the function. > There's also an option 4 that's barely been mentioned in any discussion of this issue I've seen: Use a hash table implementation that handles collisions in another way than having each bucket be a linked list. Double hashing and cuckoo hashing come to mind, where an attacker would have to find keys that map to the same value for not one, but two or more different hash functions. -- Shawn Wagner sha...@speakeasy.org -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Hashtbl and security
Am Montag, den 02.01.2012, 00:24 +0100 schrieb oliver: > > I understand it very well that adding support for cryptographically > > secure random numbers to core Ocaml is a challenge. There is no POSIX > > API, and /dev/random is, although widely available, still non-standard. > [...] > > And also might not be good enough for some certain areas. > > val Hashtbl.HashedType.hash: t -> int > > allows at least providing your own hashing.function, > but that function then must be sophisticated enough > to provide some dynamic re-seeding. It does not allow this. Because of this, the new Hashtbl (only in svn so far) has the additional seed parameter. > > And it is certainly true that there are various levels of security, and > > for some purposes the programmer should be free to install even better > > generators. Nevertheless, Ocaml is now widely used in environments where > > a certain minimum of security is demanded, and I think Ocaml should > > provide this minimum at least, and use it for things like an > > automatically chosen seed for hash tables. > > That's already planned and even implemented, as was mentioned in this thread. > So urging for a new official release would make sense. Not exactly. Xavier mentioned that he withdrew the automatic seeding. If nothing changes, you will have to provide a seed parameter to every Hashtbl.create, coming from a rng of your taste. This is what I don't like - programmers will in most cases simply ignore this possibility, and I predict it will even be ignored when there are strong indications for security threats. Also, there is no good access to rng's. Random is not designed with security in mind. The OS typically provides a generator which is much better (like /dev/random), but there is no uniform interface to it that would make it easy to use it. My concern is that the security options will simply be ignored unless the standard library includes some half-way secure defaults. > > What I could imagine is a module Sys.Security where all security > > features are accessible and configurable, e.g. > > I doubt that this makes sense. > Nearly anything that can be programmed can become a security > issue, if done wrong; especially things done on the > operating system level (like Unix module) could become > a security issue, if things are handled without care. Thanks for your argumentative help - by being ignorant you prove my thesis that typical programmers won't do anything by themselves. The world is so much trouble, better put the head into the sand, and hope the attacker won't pick you. (Well, sorry, maybe a bit too harsh, but I hope you get my point that it is no excuse that also other security issues may exist). The dangerous thing here is that it is not always apparent which hash tables are exposed in areas with security demands. So, it would be good if all hash tables had some basic protection built-in. > A mandatory (not optional) hash-function-parameter > that must be passed (plus some default hash functions > with elaborated documentation on the properties of those) > would make more sense to me. The seed is so far optional. Sure, requiring it mandatorily would ensure it is passed, but I guess programmers would just become used to the idiom "Hashtbl.create ~seed:0 ()". I'd like to rather see that at least a default seed is automatically chosen at program startup (there might even be better schemes like selecting a new default seed after every GC cycle or so). > Putting things that need tp be part of the Hash-module > aside into a Sys.security-module makes these things less > apparent to the programmer who just wants to use hashes, > and don't thinks about using hashes might be a security problem. If done right, this won't be a problem anymore for the typical user (who e.g. programs a web page, and where nobody would spend millions to hack it). Of course, if you want to run a trust center, you'll have higher demands, but I guess you'll check then your code base thoroughly for issues, and won't forget to pass seed parameters wherever needed. Gerd > So, this solution IMHO would be counterproductive > and non-intuitive. > > > Ciao, >Oliver > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa-roc.inria.fr/wws/info/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > > -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Hashtbl and security
On Sun, Jan 01, 2012 at 10:04:03PM +0100, Gerd Stolpmann wrote: > Am Sonntag, den 01.01.2012, 18:29 +0100 schrieb Xavier Leroy: > > On 01/01/2012 01:52 PM, Richard W.M. Jones wrote: > > > On Fri, Dec 30, 2011 at 06:06:26PM +0100, Xavier Leroy wrote: > > >> Indeed. The optional "seed" parameter to Hashtbl.create does exactly > > >> this in the new implementation of Hashtbl (the one based on Murmur3). > > > > > > It may be worth noting that Perl solved this problem (back in 2003) by > > > unconditionally using a seed which is a global set to a random number > > > during interpreter initialization. > > > > That's how my initial reimplementation of Hashtbl worked, using the > > Random module to produce seeds, but I was told (correctly) that in > > security-sensitive applications it's better to leave the generation of > > random numbers under control of the programmer. For some applications > > Random.self_init might be good enough and for others stronger > > randomness is needed. > > > > Of course, you can trivially emulate Perl's behavior using the new > > Hashtbl implementation + the Random module. > > I understand it very well that adding support for cryptographically > secure random numbers to core Ocaml is a challenge. There is no POSIX > API, and /dev/random is, although widely available, still non-standard. [...] And also might not be good enough for some certain areas. val Hashtbl.HashedType.hash: t -> int allows at least providing your own hashing.function, but that function then must be sophisticated enough to provide some dynamic re-seeding. Not sure if this does not rather conflict with referential transparency?! Under the hood such a function of course good do some reading from a random source... but that looks dirty to me. So such a function with optional seed-parameters (as mentioned by Xavier leroy) might make sense; when using the imperative features from my understanding it seems to be much easier to address that hash-collision problem. > And it is certainly true that there are various levels of security, and > for some purposes the programmer should be free to install even better > generators. Nevertheless, Ocaml is now widely used in environments where > a certain minimum of security is demanded, and I think Ocaml should > provide this minimum at least, and use it for things like an > automatically chosen seed for hash tables. That's already planned and even implemented, as was mentioned in this thread. So urging for a new official release would make sense. > > My argument is: Even providing a half solution is in this area better > than leaving the unwary programmer completely alone. Because in the > latter case, nothing will be done to address the problems, and apps > would be easier to attack. Maybe a "half solution" is, what already is done. I doubt that hash collisions are a new topic, so I wonder why such things were not already implemented. Only that this might be used in attacks seems to be "new". > > What I could imagine is a module Sys.Security where all security > features are accessible and configurable, e.g. I doubt that this makes sense. Nearly anything that can be programmed can become a security issue, if done wrong; especially things done on the operating system level (like Unix module) could become a security issue, if things are handled without care. A mandatory (not optional) hash-function-parameter that must be passed (plus some default hash functions with elaborated documentation on the properties of those) would make more sense to me. Putting things that need tp be part of the Hash-module aside into a Sys.security-module makes these things less apparent to the programmer who just wants to use hashes, and don't thinks about using hashes might be a security problem. So, this solution IMHO would be counterproductive and non-intuitive. Ciao, Oliver -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Hashtbl and security
Am Sonntag, den 01.01.2012, 18:29 +0100 schrieb Xavier Leroy: > On 01/01/2012 01:52 PM, Richard W.M. Jones wrote: > > On Fri, Dec 30, 2011 at 06:06:26PM +0100, Xavier Leroy wrote: > >> Indeed. The optional "seed" parameter to Hashtbl.create does exactly > >> this in the new implementation of Hashtbl (the one based on Murmur3). > > > > It may be worth noting that Perl solved this problem (back in 2003) by > > unconditionally using a seed which is a global set to a random number > > during interpreter initialization. > > That's how my initial reimplementation of Hashtbl worked, using the > Random module to produce seeds, but I was told (correctly) that in > security-sensitive applications it's better to leave the generation of > random numbers under control of the programmer. For some applications > Random.self_init might be good enough and for others stronger > randomness is needed. > > Of course, you can trivially emulate Perl's behavior using the new > Hashtbl implementation + the Random module. I understand it very well that adding support for cryptographically secure random numbers to core Ocaml is a challenge. There is no POSIX API, and /dev/random is, although widely available, still non-standard. And it is certainly true that there are various levels of security, and for some purposes the programmer should be free to install even better generators. Nevertheless, Ocaml is now widely used in environments where a certain minimum of security is demanded, and I think Ocaml should provide this minimum at least, and use it for things like an automatically chosen seed for hash tables. My argument is: Even providing a half solution is in this area better than leaving the unwary programmer completely alone. Because in the latter case, nothing will be done to address the problems, and apps would be easier to attack. What I could imagine is a module Sys.Security where all security features are accessible and configurable, e.g. val set_default_seed : int -> unit val fill_randomly : [`Fast|`Safe] -> string -> unit val set_fill_randomly : ([`Fast|`Safe] -> string -> unit) -> unit The configure script could check what is available on the system. The `Fast mode is guaranteed to always exist, and would essentially be /dev/urandom if available and otherwise Random-based. The `Safe mode would require /dev/random (or comparable), and fail otherwise. (See Wikipedia http://en.wikipedia.org/wiki/dev/random for a list of OS, and for weaknesses.) The user can override the RNG if the defaults are not ok, or if exact control is required. Of course, one could criticize such an interface in various ways. For example, it only allows a global security profile, and not different profiles for different parts of the program. Also, the distinction between a fast and a safe generator sounds a bit arbitrary. My response is simply: this is the price for the broad effect, and IMHO this counts more than having a more elaborated but also more difficult solution, or lots of user-specific solutions (as we would have to find today). Gerd -- Gerd Stolpmann, Darmstadt, Germanyg...@gerd-stolpmann.de Creator of GODI and camlcity.org. Contact details:http://www.camlcity.org/contact.html Company homepage: http://www.gerd-stolpmann.de -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Hashtbl and security
On 01/01/2012 01:52 PM, Richard W.M. Jones wrote: > On Fri, Dec 30, 2011 at 06:06:26PM +0100, Xavier Leroy wrote: >> Indeed. The optional "seed" parameter to Hashtbl.create does exactly >> this in the new implementation of Hashtbl (the one based on Murmur3). > > It may be worth noting that Perl solved this problem (back in 2003) by > unconditionally using a seed which is a global set to a random number > during interpreter initialization. That's how my initial reimplementation of Hashtbl worked, using the Random module to produce seeds, but I was told (correctly) that in security-sensitive applications it's better to leave the generation of random numbers under control of the programmer. For some applications Random.self_init might be good enough and for others stronger randomness is needed. Of course, you can trivially emulate Perl's behavior using the new Hashtbl implementation + the Random module. - Xavier Leroy -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Hashtbl and security
On Fri, Dec 30, 2011 at 06:06:26PM +0100, Xavier Leroy wrote: > Indeed. The optional "seed" parameter to Hashtbl.create does exactly > this in the new implementation of Hashtbl (the one based on Murmur3). It may be worth noting that Perl solved this problem (back in 2003) by unconditionally using a seed which is a global set to a random number during interpreter initialization. Each run of the interpreter results in a different seed, and (it is supposed therefore) users of Perl simply don't need to worry about algorithmic complexity attacks. http://perl5.git.perl.org/perl.git/blob/HEAD:/hv.h#l104 http://dev.perl.org/perl5/news/2003/perl-5.8.1.html#Hash_Randomisation Rich. -- Richard Jones Red Hat -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Hashtbl and security
On Fri, Dec 30, 2011 at 06:40:30PM +0100, ri...@happyleptic.org wrote: > I will probably tell something very stupid, but HTML specs > do not prevent a client to post 1M values with the same name, > so whatever your hash function you cannot do much, can you? [...] That's a feature. > > The simplest solution I can think of that prevents all attacks > of this kind (but could reject some valid POST in theory) would > be to store the bucket lengths and use it to detect and reject > "obviously biaised" insertions. [...] How do you define "obvious" and "biased"? Sometimes, the distinction between feature and bug depends on the context... Ciao, Oliver -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Hashtbl and security
On Sat, Dec 31, 2011 at 01:57:21AM +0100, oliver wrote: > On Fri, Dec 30, 2011 at 06:06:26PM +0100, Xavier Leroy wrote: > > On 12/30/2011 05:44 PM, Gerd Stolpmann wrote: > > > > > 1) Avoid hash tables in contexts where security is relevant. The > > > alternative is Set (actually a balanced binary tree), which does not > > > show this problem. > > > > Highly recommended. Nothing beats guaranteed O(log n) operations. > > > > > 2) Use cryptographically secure hash functions. > > > > Hopeless: with a hash size of 30 bits, as in Caml, or even 64 bits, > > there are no cryptographically secure hash functions. > > > > > 3) Use "randomized" hash tables. The trick here is that there is not a > > > single hash function h anymore, but a family h(1)...h(n). When the hash > > > table is created, one of the functions is picked randomly. This makes it > > > impossible to craft an attack request, because you cannot predict the > > > function. > > > > Indeed. The optional "seed" parameter to Hashtbl.create does exactly > > this in the new implementation of Hashtbl (the one based on Murmur3). > [...] > > > Where is this? > > I found Hashtbl.HashedType.hash with type t -> int. And there is "val hash_param : int -> int -> 'a -> int" but I'm not sure if this adresses the issue. Ciao, Oliver -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Hashtbl and security
On Fri, Dec 30, 2011 at 06:06:26PM +0100, Xavier Leroy wrote: > On 12/30/2011 05:44 PM, Gerd Stolpmann wrote: > > > 1) Avoid hash tables in contexts where security is relevant. The > > alternative is Set (actually a balanced binary tree), which does not > > show this problem. > > Highly recommended. Nothing beats guaranteed O(log n) operations. > > > 2) Use cryptographically secure hash functions. > > Hopeless: with a hash size of 30 bits, as in Caml, or even 64 bits, > there are no cryptographically secure hash functions. > > > 3) Use "randomized" hash tables. The trick here is that there is not a > > single hash function h anymore, but a family h(1)...h(n). When the hash > > table is created, one of the functions is picked randomly. This makes it > > impossible to craft an attack request, because you cannot predict the > > function. > > Indeed. The optional "seed" parameter to Hashtbl.create does exactly > this in the new implementation of Hashtbl (the one based on Murmur3). [...] Where is this? I found Hashtbl.HashedType.hash with type t -> int. But I did not found an optional parameter to Hashtbl.create Maybe that is part of the next release? Ciao, Oliver -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Hashtbl and security
On Fri, Dec 30, 2011 at 05:44:06PM +0100, Gerd Stolpmann wrote: > Hi, > > there was recently a security alert for web services that use hash > tables to store web form parameters sent via POST [...] Fixed in Perl in 2003. 28C3-Talk: http://events.ccc.de/congress/2011/Fahrplan/events/4680.en.html LQ Video of the talk: http://mirror.fem-net.de/CCC/28C3/mp4-h264-LQ/28c3-4680-en-effective_dos_attacks_against_web_application_platforms_h264-iprod.mp4 http://mirror.fem-net.de/CCC/28C3/mp4-h264-LQ/28c3-4680-en-effective_dos_attacks_against_web_application_platforms_h264-iprod.mp4.sha1 http://mirror.fem-net.de/CCC/28C3/mp4-h264-LQ/28c3-4680-en-effective_dos_attacks_against_web_application_platforms_h264-iprod.mp4.torrent HQ Video of the talk: http://mirror.fem-net.de/CCC/28C3/mp4-h264-HQ/28c3-4680-en-effective_dos_attacks_against_web_application_platforms_h264.mp4 http://mirror.fem-net.de/CCC/28C3/mp4-h264-HQ/28c3-4680-en-effective_dos_attacks_against_web_application_platforms_h264.mp4.sha1 http://mirror.fem-net.de/CCC/28C3/mp4-h264-HQ/28c3-4680-en-effective_dos_attacks_against_web_application_platforms_h264.mp4.torrent Ciao, Oliver -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Hashtbl and security
Am Freitag, den 30.12.2011, 15:52 -0500 schrieb Yaron Minsky: > It's not clever in that way. It does try to do a good job of keeping > the memory impact of the tree low, but you maintain O(1) by having a > low load factor, and therefore trees of constant size. You can take a > look at the code here: > > > https://bitbucket.org/yminsky/ocaml-core/src/8e757d8f7309/base/core/lib/core_hashtbl.ml > > (Don't rely on that repo too much yet, btw. We're probably going to > blow it away and create a new one in the next couple of days. But > going forward, we plan on using bitbucket as a place to work together > with the community on Core.) Interesting solution, at least. When I see it right, the Avltree has special representations for empty and 1-node trees, so with some luck this covers 99% of the array cells. So, the memory footprint will usually not be higher than for conventional hash tables. So, I'd consider Core_hashtbl as a way when you need high protection against pathological cases, but want to keep close to the performance pattern of Hashtbl. However, we are in an area where we need to make compromises. I guess Core_hashtbl is (for the non-pathological cases) by a factor slower than the more lightweight Hashtbl. This raises the question how it competes to other solutions, e.g. a Map where the keys are first compared by their hash before cmp is called, for instance let (++) x f = if x<>0 then x else f() module HString = struct type t = int * string let compare (h1,s1) (h2,s2) = compare h1 h2 ++ (fun () -> compare s1 s2) end module HStringMap = Map.Make(HString) now use it as: HStringMap.add (Hashtbl.hash key, key) value map This eliminates one of the drawbacks of the normal Map, namely that many keys need to be compared (which can be costly). Gerd > > y > > On Fri, Dec 30, 2011 at 2:01 PM, David Allsopp > wrote: > Yaron Minsky wrote: > > For just this reason, the hashtables in Core have been > reimplemented to use an > > AVL tree in the buckets. That way, even when you have > pathological collisions, > > you degrade gracefully to O(log n) per operation, instead of > O(n), where n is > > the number of keys in the hashtable. > > > I'm resisting the temptation to hack-it-and-see: does your > implementation do anything clever to maintain Hashtbl's O(1) > insertion time (e.g. Hashtbl.add updates a list and then the > first call to Hashtbl.find or Hashtbl.mem "moves" any items > from the list to the AVL). Or does doing that impact "general" > performance too much? > > In the POST web scenario (or processing HTTP request headers), > for example, "degrading" Hashtbl.add from O(1) to O(log n) is > only acceptable if you know that you'll query all the fields > in the POST (which isn't necessarily true). > > > David > > -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Hashtbl and security
Am Freitag, den 30.12.2011, 18:06 +0100 schrieb Xavier Leroy: > > 3) Use "randomized" hash tables. The trick here is that there is not a > > single hash function h anymore, but a family h(1)...h(n). When the hash > > table is created, one of the functions is picked randomly. This makes it > > impossible to craft an attack request, because you cannot predict the > > function. > > Indeed. The optional "seed" parameter to Hashtbl.create does exactly > this in the new implementation of Hashtbl (the one based on Murmur3). I see. It will be available in 3.13: val create : ?seed:int -> int -> ('a, 'b) t There is also an additional functorized interface where this seed argument exists (Hashtbl.MakeSeeded), and the hash functions seeded_hash and seeded_hash_param. Well done! Nevertheless, as we all don't know when 3.13 is ready, I'll have to find a temporary fix for Ocamlnet. Maybe just a limit for the number of POST parameters. > > So, the question is how to do 3). I see two problems here: > > > > a) how to define the family of hash functions. Is it e.g. sufficient to > > introduce an initialization vector for the Murmurhash algorithm, and > > fill it randomly? > > IIRC, the Web pages for the Murmur family of hashes gives some > statistical evidence that this approach works. > > > How to get a random number that is good enough? > > Hmm. /dev/random is your friend on the platforms that support it. > Otherwise, there's always the Random module, but Random.self_init > isn't very strong. Well, /dev/(u)random covers most Unix platforms nowadays. If you are interested, I have a wrapper for Win32: https://godirepo.camlcity.org/svn/lib-ocamlnet2/trunk/code/src/netsys/netsys_c_win32.c Scroll down until netsys_fill_random. Gerd -- Gerd Stolpmann, Darmstadt, Germanyg...@gerd-stolpmann.de Creator of GODI and camlcity.org. Contact details:http://www.camlcity.org/contact.html Company homepage: http://www.gerd-stolpmann.de -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Hashtbl and security
It's not clever in that way. It does try to do a good job of keeping the memory impact of the tree low, but you maintain O(1) by having a low load factor, and therefore trees of constant size. You can take a look at the code here: https://bitbucket.org/yminsky/ocaml-core/src/8e757d8f7309/base/core/lib/core_hashtbl.ml (Don't rely on that repo too much yet, btw. We're probably going to blow it away and create a new one in the next couple of days. But going forward, we plan on using bitbucket as a place to work together with the community on Core.) y On Fri, Dec 30, 2011 at 2:01 PM, David Allsopp wrote: > Yaron Minsky wrote: > > For just this reason, the hashtables in Core have been reimplemented to > use an > > AVL tree in the buckets. That way, even when you have pathological > collisions, > > you degrade gracefully to O(log n) per operation, instead of O(n), where > n is > > the number of keys in the hashtable. > > I'm resisting the temptation to hack-it-and-see: does your implementation > do anything clever to maintain Hashtbl's O(1) insertion time (e.g. > Hashtbl.add updates a list and then the first call to Hashtbl.find or > Hashtbl.mem "moves" any items from the list to the AVL). Or does doing that > impact "general" performance too much? > > In the POST web scenario (or processing HTTP request headers), for > example, "degrading" Hashtbl.add from O(1) to O(log n) is only acceptable > if you know that you'll query all the fields in the POST (which isn't > necessarily true). > > > David > -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
RE: [Caml-list] Hashtbl and security
Yaron Minsky wrote: > For just this reason, the hashtables in Core have been reimplemented to use an > AVL tree in the buckets. That way, even when you have pathological > collisions, > you degrade gracefully to O(log n) per operation, instead of O(n), where n is > the number of keys in the hashtable. I'm resisting the temptation to hack-it-and-see: does your implementation do anything clever to maintain Hashtbl's O(1) insertion time (e.g. Hashtbl.add updates a list and then the first call to Hashtbl.find or Hashtbl.mem "moves" any items from the list to the AVL). Or does doing that impact "general" performance too much? In the POST web scenario (or processing HTTP request headers), for example, "degrading" Hashtbl.add from O(1) to O(log n) is only acceptable if you know that you'll query all the fields in the POST (which isn't necessarily true). David -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Hashtbl and security
Not a problem, other than memory waste (which doesn't matter the key). Hashtbl.add prepends to the front of the list, so it'll be fast. And Hashtbl.find will match the head of the list, and return quickly too. Hashtbl.find_all will have to go through the whole list, but that's expected. E. On Fri, Dec 30, 2011 at 12:40 PM, wrote: > I will probably tell something very stupid, but HTML specs > do not prevent a client to post 1M values with the same name, > so whatever your hash function you cannot do much, can you? > > The simplest solution I can think of that prevents all attacks > of this kind (but could reject some valid POST in theory) would > be to store the bucket lengths and use it to detect and reject > "obviously biaised" insertions. > > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa-roc.inria.fr/wws/info/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > > -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Hashtbl and security
I will probably tell something very stupid, but HTML specs do not prevent a client to post 1M values with the same name, so whatever your hash function you cannot do much, can you? The simplest solution I can think of that prevents all attacks of this kind (but could reject some valid POST in theory) would be to store the bucket lengths and use it to detect and reject "obviously biaised" insertions. -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Hashtbl and security
On 12/30/2011 05:44 PM, Gerd Stolpmann wrote: > 1) Avoid hash tables in contexts where security is relevant. The > alternative is Set (actually a balanced binary tree), which does not > show this problem. Highly recommended. Nothing beats guaranteed O(log n) operations. > 2) Use cryptographically secure hash functions. Hopeless: with a hash size of 30 bits, as in Caml, or even 64 bits, there are no cryptographically secure hash functions. > 3) Use "randomized" hash tables. The trick here is that there is not a > single hash function h anymore, but a family h(1)...h(n). When the hash > table is created, one of the functions is picked randomly. This makes it > impossible to craft an attack request, because you cannot predict the > function. Indeed. The optional "seed" parameter to Hashtbl.create does exactly this in the new implementation of Hashtbl (the one based on Murmur3). > So, the question is how to do 3). I see two problems here: > > a) how to define the family of hash functions. Is it e.g. sufficient to > introduce an initialization vector for the Murmurhash algorithm, and > fill it randomly? IIRC, the Web pages for the Murmur family of hashes gives some statistical evidence that this approach works. > How to get a random number that is good enough? Hmm. /dev/random is your friend on the platforms that support it. Otherwise, there's always the Random module, but Random.self_init isn't very strong. - Xavier Leroy -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Hashtbl and security
For just this reason, the hashtables in Core have been reimplemented to use an AVL tree in the buckets. That way, even when you have pathological collisions, you degrade gracefully to O(log n) per operation, instead of O(n), where n is the number of keys in the hashtable. y On Fri, Dec 30, 2011 at 11:44 AM, Gerd Stolpmann wrote: > Hi, > > there was recently a security alert for web services that use hash > tables to store web form parameters sent via POST (so that millions of > such parameters can be sent in a single request). It is possible to keep > the web service busy for hours with such a DoS (denial of service) > attack. The type of attack boils down to a problem in most hash table > implementations, namely that the hash functions are invertible, and it > is possible for a malicious user to construct lots of keys that all map > to the same bucket of the hash table, creating a mass collision. > > The text of the alert: > http://www.nruns.com/_downloads/advisory28122011.pdf > > I'd like to discuss this issue, because it is not restricted to the > processing of web requests, but may also occur for all other data coming > from untrusted sources. The web is only the most exposed area where this > issue exists. > > So how is Ocaml affected? The hash functions used in recent Ocaml > releases are also insecure in the above mentioned sense (currently > MurmurHash3, and even a simpler hash function in previous releases). A > quick survey of the Internet revealed at least one site that tries to > break it. Probably a good cryptographer could do it in minutes. > > A pure Hashtbl.add of the constructed keys does not yet lead to the > performance degradation, but a Hashtbl.replace, and of course > Hashtbl.find after the table is built up will. So it depends very much > of the details of the programs whether they are affected or not. > > I've just checked that Ocamlnet uses only Hashtbl.add to collect POST > parameters, so it is not directly vulnerable. But if the crafted request > is actually served by a handler, the handler would get a degraded table, > and could show in turn bad performance (again leading to DoS). > > What are possible fixes? > > 1) Avoid hash tables in contexts where security is relevant. The > alternative is Set (actually a balanced binary tree), which does not > show this problem. > > 2) Use cryptographically secure hash functions. > > 3) Use "randomized" hash tables. The trick here is that there is not a > single hash function h anymore, but a family h(1)...h(n). When the hash > table is created, one of the functions is picked randomly. This makes it > impossible to craft an attack request, because you cannot predict the > function. > > I don't think 1) is viable - hash tables are too wide spread, and are > loved by programmers because of their good performance. 2) would be good > in extremely critical contexts - although it is then again questionable, > because it is likely to have worse performance than 1). > > So, the question is how to do 3). I see two problems here: > > a) how to define the family of hash functions. Is it e.g. sufficient to > introduce an initialization vector for the Murmurhash algorithm, and > fill it randomly? How to get a random number that is good enough? > > b) the Hashtbl in the standard library does not allow it to set the hash > function dynamically. Maybe one can get this effect by using first-class > modules (haven't checked). > > Anyway, I think these problems are difficult enough to deserve some > discussion here. At least I cannot solve them immediately, and this > problem may exist for lots of Ocaml applications. > > Gerd > -- > > Gerd Stolpmann, Darmstadt, Germanyg...@gerd-stolpmann.de > Creator of GODI and camlcity.org. > Contact details:http://www.camlcity.org/contact.html > Company homepage: http://www.gerd-stolpmann.de > > > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa-roc.inria.fr/wws/info/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > > -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs