On 14.11.2012 02:39, Roy Smith wrote: > The next step is to reduce the number of bits you are encoding. You > said in another post that "1 collision in 10 million hashes would be > tolerable". So you need: > >>>> math.log(10*1000*1000, 2) > 23.25349666421154 > > 24 bits worth of key.
Nope :-) > Base64 encoded, that's only 4 characters. > Actually, I probably just proved that I don't really understand how > probabilities work, so maybe what you really need is 32 or 48 or 64 > bits. :-)) When doing these calculations, it's important to keep the birthday paradox in mind (this is kind of counter-intuitive): The chance of a collission raises tremendously when we're looking for *any* arbitrary two hashes colliding within a certain namespace. The probability you've calculated is the pre-image probability (which you also again need to multiply with a factor of two, because when trying to collide one given hash, in the mean case you'll only have to search *half* the namespace before finding a collision). There are three things you need to know before you can give an estimate: 1. The permissible probability of a collision (1e-7 in this case) 2. The hash size 3. The worst-case number of elements in the namespace You neglected 3 completely -- but knowing this is really important. This becomes obvious when looking at the extreme cases: Let's say you have a hash of arbitrary size, but only hash one element. The chance of a collision is *always* zero. Or look at a hash of size 2^n. Then put 2^n + 1 elements in the namespace. The chance of a collision is *always* one. Doing the calculations (formulas can be found on wikipedia on the site of the birthday phaenomenon), you can come up with these following bitlenghts of the hash with a 1e-7 probability of collision in respect to the worst-case number of elements 10k elements: 49 bit 100k elements: 56 bit 1e6 elements: 63 bit 100e6 elements: 76 bit 1e9 elements: 83 bit 1e12 elements: 102 bit Best regards, Johannes -- >> Wo hattest Du das Beben nochmal GENAU vorhergesagt? > Zumindest nicht öffentlich! Ah, der neueste und bis heute genialste Streich unsere großen Kosmologen: Die Geheim-Vorhersage. - Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$1...@speranza.aioe.org> -- http://mail.python.org/mailman/listinfo/python-list