On Thu, May 17, 2018 at 5:21 PM, Anthony Flury via Python-Dev <python-dev@python.org> wrote: > Victor, > Thanks for the link, but to be honest it will just confuse people - neither > the link or the related bpo entries state that the fix is only limited to > strings. They simply talk about hash randomization - which in my opinion > implies ALL hash algorithms; which is why I asked the question. > > I am not sure how much should be exposed about the scope of security fixes > but you can understand my (and other's) confusion. > > I am aware that applications shouldn't make assumptions about the value of > any given hash value - apart from some simple assumptions based hash value > equality (i.e. if two objects have different hash values they can't be the > same value).
The hash values of Python objects are calculated by the __hash__ method, so arbitrary objects can do what they like, including degenerate algorithms such as: class X: def __hash__(self): return 7 So it's impossible to randomize ALL hashes at the language level. Only str and bytes hashes are randomized, because they're the ones most likely to be exploitable - for instance, a web server will receive a query like "http://spam.example/target?a=1&b=2&c=3" and provide a dictionary {"a":1, "b":2, "c":3}. Similarly, a JSON decoder is always going to create string keys in its dictionaries (JSON objects). Do you know of any situation in which an attacker can provide the keys for a dict/set as integers? > /B//TW : // > // > //This question was prompted by a question on a social media platform about > the whether hash values are transferable between across platforms. > Everything I could find stated that after Python 3.3 ALL hash values were > randomized - but that clearly isn't the case; and the original questioner > identified that some hash values are randomized and other aren't.// > / That's actually immaterial. Even if the hashes weren't actually randomized, you shouldn't be making assumptions about anything specific in the hash, save that *within one Python process*, two equal values will have equal hashes (and therefore two objects with unequal hashes will not be equal). > //I did suggest strongly to the original questioner that relying on the same > hash value across different platforms wasn't a clever solution - their > original plan was to store hash values in a cross system database to enable > quick retrieval of data (!!!). I did remind the OP that a hash value wasn't > guaranteed to be unique anyway - and they might come across two different > values with the same hash - and no way to distinguish between them if all > they have is the hash. Hopefully their revised design will store the key, > not the hash./ Uhh.... if you're using a database, let the database do the work of being a database. I don't know what this "cross system database" would be implemented in, but if it's a proper multi-user relational database engine like PostgreSQL, it's already going to have way better indexing than anything you'd do manually. I think there are WAY better solutions than worrying about Python's inbuilt hashing. If you MUST hash your data for sharing and storage, the easiest solution is to just use a cryptographic hash straight out of hashlib.py. ChrisA _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com