[Python-Dev] hello, new dict addition for new eve ?
Hello, Sorry to annoy the very busy core devs :) out of the blue I quite noticed people were 1) wanting to have a new dict for Xmas 2) strongly resenting dict addition. Even though I am not a good developper, I have come to a definition of addition that would follow algebraic rules, and not something of a dutch logic. (it is a jest, not a troll) I propose the following code to validate my point of view regarding the dictionnatry addition as a proof of concept : https://github.com/jul/ADictAdd_iction/blob/master/test.py It follows all my dusty math books regarding addition + it has the amability to have rules of conservation. I pretty much see a real advantage in this behaviour in functional programming (map/reduce). (see the demonstrate.py), and it has a sense (if dict can be seen has vectors). I have been told to be a troll, but I am pretty serious. Since, I coded with luck, no internet, intuition, and a complete ignorance of the real meaning of the magic methods most of the time, thus the actual implementation of the addition surely needs a complete refactoring. Sheers, Bonne fêtes Julien ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Your email to the mailing list
...oops, I did not intend to send this to the mailing list. I apologize for the accidental off topic. On Fri, Dec 30, 2011 at 7:40 AM, lahwran wrote: > I don't want to post to the mailing list about this; But I must say, I > found your email very entertaining. You have a good sense of humor. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Your email to the mailing list
I don't want to post to the mailing list about this; But I must say, I found your email very entertaining. You have a good sense of humor. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] hello, new dict addition for new eve ?
Hi Julien, Don't despair! I have tried to get people to warm up to dict addition too -- in fact it was my counter-proposal at the time when we were considering adding sets to the language. I will look at your proposal, but I have a point of order first: this should be discussed on python-ideas, not on python-dev. I have added python-ideas to the thread and moved python-dev to Bcc, so followups will hopefully all go to python-ideas. --Guido On Fri, Dec 30, 2011 at 7:26 AM, julien tayon wrote: > Hello, > Sorry to annoy the very busy core devs :) out of the blue > > I quite noticed people were > 1) wanting to have a new dict for Xmas > 2) strongly resenting dict addition. > > Even though I am not a good developper, I have come to a definition of > addition that would follow algebraic rules, and not something of a > dutch logic. (it is a jest, not a troll) > > I propose the following code to validate my point of view regarding > the dictionnatry addition as a proof of concept : > https://github.com/jul/ADictAdd_iction/blob/master/test.py > > It follows all my dusty math books regarding addition + it has the > amability to have rules of conservation. > > I pretty much see a real advantage in this behaviour in functional > programming (map/reduce). (see the demonstrate.py), and it has a sense > (if dict can be seen has vectors). > > I have been told to be a troll, but I am pretty serious. > > Since, I coded with luck, no internet, intuition, and a complete > ignorance of the real meaning of the magic methods most of the time, > thus the actual implementation of the addition surely needs a complete > refactoring. > > Sheers, > Bonne fêtes > Julien > ___ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Summary of Python tracker Issues
ACTIVITY SUMMARY (2011-12-23 - 2011-12-30) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open3178 (+10) closed 22288 (+16) total 25466 (+26) Open issues with patches: 1365 Issues opened (21) == #12760: Add create mode to open() http://bugs.python.org/issue12760 reopened by pitrou #13294: http.server: minor code style changes. http://bugs.python.org/issue13294 reopened by ezio.melotti #13659: Add a help() viewer for IDLE's Shell. http://bugs.python.org/issue13659 opened by ramchandra.apte #13663: pootle.python.org is outdated. http://bugs.python.org/issue13663 opened by naoki #13664: UnicodeEncodeError in gzip when filename contains non-ascii http://bugs.python.org/issue13664 opened by jason.coombs #13665: TypeError: string or integer address expected instead of str i http://bugs.python.org/issue13665 opened by jason.coombs #13666: datetime documentation typos http://bugs.python.org/issue13666 opened by steveire #13668: mute ImportError in __del__ of _threading_local module http://bugs.python.org/issue13668 opened by Zhiping.Deng #13669: XATTR_SIZE_MAX and XATTR_LIST_MAX undefined on kfreebsd/debian http://bugs.python.org/issue13669 opened by zbysz #13670: Increase test coverage for pstats.py http://bugs.python.org/issue13670 opened by andrea.crotti #13672: Add co_qualname attribute in code objects http://bugs.python.org/issue13672 opened by Arfrever #13673: PyTraceBack_Print() fails if signal received but PyErr_CheckSi http://bugs.python.org/issue13673 opened by sbt #13674: crash in datetime.strftime http://bugs.python.org/issue13674 opened by patrick.vrijlandt #13676: sqlite3: Zero byte truncates string contents http://bugs.python.org/issue13676 opened by petri.lehtinen #13677: correct docstring for builtin compile http://bugs.python.org/issue13677 opened by Jim.Jewett #13679: Multiprocessing system crash http://bugs.python.org/issue13679 opened by Rock.Achu #13680: Aifc comptype write fix http://bugs.python.org/issue13680 opened by Oleg.Plakhotnyuk #13681: Aifc read compressed frames fix http://bugs.python.org/issue13681 opened by Oleg.Plakhotnyuk #13682: Documentation of os.fdopen() refers to non-existing bufsize ar http://bugs.python.org/issue13682 opened by petri.lehtinen #13683: Docs in Python 3:raise statement mistake http://bugs.python.org/issue13683 opened by ramchandra.apte #13684: httplib tunnel infinite loop http://bugs.python.org/issue13684 opened by luzakiru Most recent 15 issues with no replies (15) == #13684: httplib tunnel infinite loop http://bugs.python.org/issue13684 #13683: Docs in Python 3:raise statement mistake http://bugs.python.org/issue13683 #13682: Documentation of os.fdopen() refers to non-existing bufsize ar http://bugs.python.org/issue13682 #13680: Aifc comptype write fix http://bugs.python.org/issue13680 #13677: correct docstring for builtin compile http://bugs.python.org/issue13677 #13668: mute ImportError in __del__ of _threading_local module http://bugs.python.org/issue13668 #13666: datetime documentation typos http://bugs.python.org/issue13666 #13665: TypeError: string or integer address expected instead of str i http://bugs.python.org/issue13665 #13664: UnicodeEncodeError in gzip when filename contains non-ascii http://bugs.python.org/issue13664 #13649: termios.ICANON is not documented http://bugs.python.org/issue13649 #13641: decoding functions in the base64 module could accept unicode s http://bugs.python.org/issue13641 #13640: add mimetype for application/vnd.apple.mpegurl http://bugs.python.org/issue13640 #13638: PyErr_SetFromErrnoWithFilenameObject is undocumented http://bugs.python.org/issue13638 #13633: Handling of hex character references in HTMLParser.handle_char http://bugs.python.org/issue13633 #13631: readline fails to parse some forms of .editrc under editline ( http://bugs.python.org/issue13631 Most recent 15 issues waiting for review (15) = #13684: httplib tunnel infinite loop http://bugs.python.org/issue13684 #13681: Aifc read compressed frames fix http://bugs.python.org/issue13681 #13680: Aifc comptype write fix http://bugs.python.org/issue13680 #13677: correct docstring for builtin compile http://bugs.python.org/issue13677 #13676: sqlite3: Zero byte truncates string contents http://bugs.python.org/issue13676 #13673: PyTraceBack_Print() fails if signal received but PyErr_CheckSi http://bugs.python.org/issue13673 #13670: Increase test coverage for pstats.py http://bugs.python.org/issue13670 #13668: mute ImportError in __del__ of _threading_local module http://bugs.python.org/issue13668 #13651: Improve redirection in urllib http://bugs.python.org/issue13651 #13645: import machinery vulnerable to timestamp collisions http://bugs.python.org/iss
Re: [Python-Dev] [Python-checkins] cpython: Issue #12715: Add an optional symlinks argument to shutil functions (copyfile,
On Thu, Dec 29, 2011 at 11:55, antoine.pitrou wrote: > http://hg.python.org/cpython/rev/cf57ef65bcd0 > changeset: 74194:cf57ef65bcd0 > user: Antoine Pitrou > date: Thu Dec 29 18:54:15 2011 +0100 > summary: > Issue #12715: Add an optional symlinks argument to shutil functions > (copyfile, copymode, copystat, copy, copy2). > When that parameter is true, symlinks aren't dereferenced and the operation > instead acts on the symlink itself (or creates one, if relevant). > > Patch by Hynek Schlawack. > > files: > Doc/library/shutil.rst | 46 - > Lib/shutil.py | 101 +--- > Lib/test/test_shutil.py | 219 > Misc/NEWS | 5 + > 4 files changed, 333 insertions(+), 38 deletions(-) > > > diff --git a/Doc/library/shutil.rst b/Doc/library/shutil.rst > --- a/Doc/library/shutil.rst > +++ b/Doc/library/shutil.rst > @@ -45,7 +45,7 @@ > be copied. > > > -.. function:: copyfile(src, dst) > +.. function:: copyfile(src, dst[, symlinks=False]) > > Copy the contents (no metadata) of the file named *src* to a file named > *dst*. > *dst* must be the complete target file name; look at :func:`copy` for a > copy that > @@ -56,37 +56,56 @@ > such as character or block devices and pipes cannot be copied with this > function. *src* and *dst* are path names given as strings. > > + If *symlinks* is true and *src* is a symbolic link, a new symbolic link > will > + be created instead of copying the file *src* points to. > + > .. versionchanged:: 3.3 > :exc:`IOError` used to be raised instead of :exc:`OSError`. > + Added *symlinks* argument. Can we expect that readers on Windows know how os.symlink works, or should the stipulations of os.symlink usage also be laid out or at least linked to from there? Basically, almost everyone is going to get an OSError if they call this on Windows. You have to be on Windows Vista or beyond *and* the calling process has to have the proper privileges (typically gained through elevation - "Run as Administrator"). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cpython: Issue #12715: Add an optional symlinks argument to shutil functions (copyfile,
On Fri, 30 Dec 2011 13:29:36 -0600 Brian Curtin wrote: > > Can we expect that readers on Windows know how os.symlink works, or > should the stipulations of os.symlink usage also be laid out or at > least linked to from there? I assume it won't make a difference in real life, since symlinks are quite rare under Windows. > Basically, almost everyone is going to get an OSError if they call > this on Windows. You have to be on Windows Vista or beyond *and* the > calling process has to have the proper privileges (typically gained > through elevation - "Run as Administrator"). I still haven't managed to use symlinks under Windows 7, myself. The recipes I've tried didn't work. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cpython: Issue #12715: Add an optional symlinks argument to shutil functions (copyfile,
On Fri, Dec 30, 2011 at 13:39, Antoine Pitrou wrote: > On Fri, 30 Dec 2011 13:29:36 -0600 > Brian Curtin wrote: >> >> Can we expect that readers on Windows know how os.symlink works, or >> should the stipulations of os.symlink usage also be laid out or at >> least linked to from there? > > I assume it won't make a difference in real life, since symlinks are > quite rare under Windows. > >> Basically, almost everyone is going to get an OSError if they call >> this on Windows. You have to be on Windows Vista or beyond *and* the >> calling process has to have the proper privileges (typically gained >> through elevation - "Run as Administrator"). > > I still haven't managed to use symlinks under Windows 7, myself. > The recipes I've tried didn't work. This might be a place where an image in the documentation would be helpful. I don't think we do that anywhere else, but maybe I could add it to the (sorely out of date and in need of a rebuild) Windows FAQ? What you need to do on Win7 is go to Start > All Programs > Accessories > Command Prompt, but right click on it instead of left click. Choose "Run as Administrator", then it'll make you choose yes or no to elevate privileges. At that point, deep in the heart of everyone's favorite operating system, it should acquire the SeCreateSymbolicLink user privilege. After that, os.symlink should work fine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Hash collision security issue (now public)
In http://mail.python.org/pipermail/python-dev/2011-December/115138.html, Christian Heimes pointed out that > ... we don't have to alter the outcome of hash ... We just need to reduce the > chance that > an attacker can produce collisions in the dict (and set?) I'll state it more strongly. hash probably should not change (at least for this), but we may want to consider a different conflict resolution strategy when the first slot is already filled. Remember that there was a fair amount of thought and timing effort put into selecting the current strategy; it is deliberately sub-optimal for random input, in order to do better with typical input. < http://hg.python.org/cpython/file/7010fa9bd190/Objects/dictnotes.txt > If there is a change, it would currently be needed in three places for each of set and dict (the lookdict functions and insertdict_clean). It may be worth adding some macros just to keep those six in sync. Once those macros are in place, that allows a compile-time switch. My personal opinion is that accepting *and parsing* enough data for this to be a problem is enough of an edge case that I don't want normal dicts slowed down at all for this; I would therefore prefer that the change be restricted to such a compile-time switch, with current behavior the default. http://hg.python.org/cpython/file/7010fa9bd190/Objects/dictobject.c#l571 583for (perturb = hash; ep->me_key != NULL; perturb >>= PERTURB_SHIFT) { 584 i = (i << 2) + i + perturb + 1; PERTURB_SHIFT is already a private #define to 5; per dictnotes, 4 and 6 perform almost as well. Someone worried can easily make that change today, and be protected from "generic" anti-python attacks. I believe the salt suggestions have equivalent to replacing perturb = hash; with something likeperturb = hash + salt; Changing i = (i << 2) + i + perturb + 1;would allow effectively replacing the initial hash, but risks spoiling performance in the non-adversary case. Would there be objections to replacing those two lines with something like: for (perturb = FIRST_PERTURB(hash, key); ep->me_key != NULL; perturb = NEXT_PERTURB(hash, key, perturb)) { i = NEXT_SLOT(i, perturb); The default macro definitions should keep things as they are #define FIRST_PERTURB(hash, key)hash #define NEXT_PERTURB(hash, key, perturb)perturb >> PERTURB_SHIFT #define NEXT_SLOT(i, perturb)(i << 2) + i + perturb + 1 while allowing #ifdefs for (slower but) safer things like adding a salt, or even using alternative hashes. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Le 29/12/2011 02:28, Michael Foord a écrit : A paper (well, presentation) has been published highlighting security problems with the hashing algorithm (exploiting collisions) in many programming languages Python included: http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf This PDF doesn't explain exactly the problem and how it can be solved. Let's try to summarize this "vulnerability". The creation of a Python dictionary has a complexity of O(n) in most cases, but O(n^2) in the *worst* case. The attack tries to go into the worst case. It requires to compute a set of N keys having the same hash value (hash(key1) == hash(key2) == ... hash(keyN)). It only has to compute these keys once. It looks like it is now cheap enough in practice to compute this dataset for Python (and other languages). A countermeasure would be to check that we don't have more than X keys with the same hash value. But in practice, we don't know in advance how data are processed, and there are too many input vectors in various formats. If we want to fix something, it should be done in the implementation of the dict type or in the hash algorithm. We can implement dict differently to avoid this issue, using a binary tree for example. Because dict is a fundamental type in Python, I don't think that we can change its implementation (without breaking backward compatibility and so applications in production). A possibility would be to add a *new* type, but all libraries and applications would need to be changed to fix the vulnerability. The last choice is to change the hash algorithm. The *idea* is the same than adding salt to hashed password (in practice it will be a little bit different): if a pseudo-random salt is added, the attacker cannot prepare a single dataset, he/she will have to regenerate a new dataset for each possible salt value. If the salt is big enough (size in bits), the attacker will need too much CPU to generate the dataset (compute N keys with the same hash value). Basically, it slows down the attack by 2^(size of the salt). -- Another possibility would be to replace our fast hash function by a better hash function like MD5 or SHA1 (so the creation of the dataset would be too slow in practice = too expensive), but cryptographic hash functions are much slower (and so would slow down Python too much). Limiting the size of the POST data doesn't solve the problem because there are many other input vectors and data formats. It may block the most simple attacks because the attacker cannot inject enough keys to slow down your CPU. Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Jim Jewett wrote: My personal opinion is that accepting *and parsing* enough data for this to be a problem is enough of an edge case that I don't want normal dicts slowed down at all for this; I would therefore prefer that the change be restricted to such a compile-time switch, with current behavior the default. By compile-time, do you mean when the byte-code is compilated, i.e. just before runtime, rather than a switch when compiling the Python executable from source? I will assume so. I'm not a big fan of compile-time (runtime) switches. It makes it too hard to compare before-and-after behaviour within a single session, and impossible to have fine control over which objects have which behaviour. I don't like all-or-nothing settings. (E.g. I'd love to be able to turn -O optimization on and off on a per-function basis, but can't.) How about using a similar strategy to the current dict behaviour with __missing__ and defaultdict? Here's my suggestion: - If a dict subclass defines __salt__, then it is called to salt the hash value before lookups. If __salt__ is undefined or None, the current behaviour remains unchanged. - Add a dict subclass (saltdict, for lack of a better name) that defines __salt__ appropriately to the collections module. In this case, I don't know enough to suggest what is an appropriate salt. I leave that to the security experts to argue about. - Update the relevant standard library modules to use saltdict where needed. This allows a single application or framework to use saltdict where necessary, without slowing down all dict accesses. Dicts which never see user-generated input (e.g. globals) can remain full-speed. If there is no consensus about the best salting strategy, then apps can choose their own by subclassing dict. Responsibility for doing the right thing falls onto the library author, rather than Python itself. Some people may consider that a minus. -- Steven ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Le 29/12/2011 14:19, Christian Heimes a écrit : Perhaps the dict code is a better place for randomization. The problem is the creation of a dict with keys all having the same hash value. The current implementation of dict uses a linked-list. Adding a new item requires to compare the new key to all existing keys (compare the value, not the hash, which is much slower). We had to change completly how dict is implemented to be able to fix this issue. I don't think that we can change the dict implementation without breaking backward compatibility or breaking applications. Change the implementation would change dict properties, and applications rely on the properties of the current implementation. Tell me if I am wrong. Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
In case the watchdog is not a viable solution as I had assumed it was, I think it's more reasonable to indeed consider adding a flag to Python that allows randomization of hashes optionally before startup. A flag will only be needed if the overhead of the fix is too high. However as it was said earlier, the attack is a lot more complex to carry out on a 64bit environment that it's probably (as it stands right now!) safe to ignore. I suppose that there are still servers running 32 bits Python. The main problem there however is not that it's a new attack but that some dickheads could now make prebaked attacks against websites to disrupt them that might cause some negative publicity. In general though there are so many more ways to DDOS a website than this that I would rate the whole issue very low. There are countermeasures for low level DDOS (ICMP ping flood, TCP syn flood, etc.). An application (or a firewall) cannot implement a countermeasure for this high level issue. It can only be fixed in Python directly (by changing the implementation of the dict type or of the hash function). Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Am 31.12.2011 03:31, schrieb Victor Stinner: > Le 29/12/2011 14:19, Christian Heimes a écrit : >> Perhaps the dict code is a better place for randomization. > > The problem is the creation of a dict with keys all having the same hash > value. The current implementation of dict uses a linked-list. Adding a > new item requires to compare the new key to all existing keys (compare > the value, not the hash, which is much slower). > > We had to change completly how dict is implemented to be able to fix > this issue. I don't think that we can change the dict implementation > without breaking backward compatibility or breaking applications. Change > the implementation would change dict properties, and applications rely > on the properties of the current implementation. > > Tell me if I am wrong. You are right and I was wrong. We can't do the randomization inside the dict code. The randomization factor must used as initialization factor (IV) for the hashing algorithm. At first I thought the attack used the unique property of Python's dict implementation (perturbed hash instead of buckets for equal hashes) but I was wrong. It took me several hours of reading and digging into the math until I figured out my mistake. Sorry! :) This means we can't implement a salted dict unless the salted dict re-implemention the hash algorithm for unicode, bytes and memoryview. I doubt that a wise idea. I've checked my first draft of a possible solution: http://hg.python.org/features/randomhash/ . The pseudo RNG has to be replaced with something better and it's missing an option to feed a seed, too. Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Am 31.12.2011 03:19, schrieb Steven D'Aprano: > How about using a similar strategy to the current dict behaviour with > __missing__ and defaultdict? Here's my suggestion: > > > - If a dict subclass defines __salt__, then it is called to salt the hash >value before lookups. If __salt__ is undefined or None, the current >behaviour remains unchanged. This was my initial proposal, too. It took me a while to figure out that it won't work. Post-salting won't fix the issue. The random seed must be used as IV inside hashing algorithm. My brain was still in holiday mode and it took me a while to figure out the math. Sorry for any confusion! Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
Am 31.12.2011 03:22, schrieb Victor Stinner: > The creation of a Python dictionary has a complexity of O(n) in most > cases, but O(n^2) in the *worst* case. The attack tries to go into the > worst case. It requires to compute a set of N keys having the same hash > value (hash(key1) == hash(key2) == ... hash(keyN)). It only has to > compute these keys once. It looks like it is now cheap enough in > practice to compute this dataset for Python (and other languages). Correct. The meet-in-the-middle attack and the unique properties of algorithms that are similar to DJBX33A and DJBX33A make the attack easy on platforms with 32bit hash. > A countermeasure would be to check that we don't have more than X keys > with the same hash value. But in practice, we don't know in advance how > data are processed, and there are too many input vectors in various formats. > > If we want to fix something, it should be done in the implementation of > the dict type or in the hash algorithm. We can implement dict > differently to avoid this issue, using a binary tree for example. > Because dict is a fundamental type in Python, I don't think that we can > change its implementation (without breaking backward compatibility and > so applications in production). A possibility would be to add a *new* > type, but all libraries and applications would need to be changed to fix > the vulnerability. A BTree is too slow for common operations, it's O(log n) instead of O(1) in average. We can't replace our dict with a btree type. A new btree type is a lot of work, too. The unique structure of CPython's dict implementation makes it harder to get the number of values with equal hash. The academic hash map (the one I learnt about at university) uses a bucket to store all elements with equal hash (more precise hash: mod mask). However Python's dict however perturbs the hash until it finds a free slot its array. The second, third ... collision can be caused by a legit and completely different (!) hash. > The last choice is to change the hash algorithm. The *idea* is the same > than adding salt to hashed password (in practice it will be a little bit > different): if a pseudo-random salt is added, the attacker cannot > prepare a single dataset, he/she will have to regenerate a new dataset > for each possible salt value. If the salt is big enough (size in bits), > the attacker will need too much CPU to generate the dataset (compute N > keys with the same hash value). Basically, it slows down the attack by > 2^(size of the salt). That's the idea of randomized hashing functions as implemented by Ruby 1.8, Perl and others. The random seed is used as IV. Multiple rounds of multiply, XOR and MOD (integer overflows) cause a deviation. In your other posting you were worried about the performance implication. A randomized hash function just adds a single ADD operation, that's all. Downside: With randomization all hashes are unpredictable and change after every restart of the interpreter. This has some subtle side effects like a different outcome of {a:1, b:1, c:1}.keys() after a restart of the interpreter. > Another possibility would be to replace our fast hash function by a > better hash function like MD5 or SHA1 (so the creation of the dataset > would be too slow in practice = too expensive), but cryptographic hash > functions are much slower (and so would slow down Python too much). I agree with your analysis. Cryptographic hash functions are far too slow for our use case. During my research I found another hash function that claims to be fast and that may not be vulnerable to this kind of attack: http://isthe.com/chongo/tech/comp/fnv/ Christian ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Hash collision security issue (now public)
On 12/30/2011 8:04 PM, Jim Jewett wrote: I'll state it more strongly. hash probably should not change (at least for this), I agree, especially since the vulnerability can be avoided by using 64 bit servers and will generally abate as more switch anyway. > but we may want to consider a different conflict resolution strategy when the first slot is already filled. Remember that there was a fair amount of thought and timing effort put into selecting the current strategy; it is deliberately sub-optimal for random input, in order to do better with typical input.< http://hg.python.org/cpython/file/7010fa9bd190/Objects/dictnotes.txt> It would be good to have a set of attack strings to see how vulernerable Py dicts actually are (Python may not have been actually tested with data) and the affect of any change. I gave the project email of the 2 presenters in my first post. They apparently want to work with language developers to improve defenses against attack. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com