Re: [Python-Dev] Hashes in Python3.5 for tuples and frozensets

Anthony Flury via Python-Dev Thu, 17 May 2018 07:18:34 -0700

Chris,

I entirely agree. The same questioner also asked about the fastest datatype to use as a key in a dictionary; and which data structure isfastest. I get the impression the person is very intomicro-optimization, without profiling their application. It seems everychoice is made based on the speed of that operation; withoutconsideration of how often that operation is used.


On 17/05/18 09:16, Chris Angelico wrote:

On Thu, May 17, 2018 at 5:21 PM, Anthony Flury via Python-Dev
<[email protected]> wrote:

Victor,
Thanks for the link, but to be honest it will just confuse people - neither
the link or the related bpo entries state that the fix is only limited to
strings. They simply talk about hash randomization - which in my opinion
implies ALL hash algorithms; which is why I asked the question.

I am not sure how much should be exposed about the scope of security fixes
but you can understand my (and other's) confusion.

I am aware that applications shouldn't make assumptions about the value of
any given hash value - apart from some simple assumptions based hash value
equality (i.e. if two objects have different hash values they can't be the
same value).

The hash values of Python objects are calculated by the __hash__
method, so arbitrary objects can do what they like, including
degenerate algorithms such as:

class X:
     def __hash__(self): return 7

Agreed - I should have said the default hash algorithm. Hashes forcustom object are entirely application dependent.


So it's impossible to randomize ALL hashes at the language level. Only
str and bytes hashes are randomized, because they're the ones most
likely to be exploitable - for instance, a web server will receive a
query like "http://spam.example/target?a=1&b=2&c=3"; and provide a
dictionary {"a":1, "b":2, "c":3}. Similarly, a JSON decoder is always
going to create string keys in its dictionaries (JSON objects). Do you
know of any situation in which an attacker can provide the keys for a
dict/set as integers?

I was just asking the question - rather than critiquing the fault-fix. Iam actually more concerned that the documentation relating to the fixdoesn't make it clear that only strings have their hashes randomised.

/B//TW : //
//
//This question was prompted by a question on a social media platform about
the whether hash values are transferable between across platforms.
Everything I could find stated that after Python 3.3 ALL hash values were
randomized - but that clearly isn't the case; and the original questioner
identified that some hash values are randomized and other aren't.//
/

That's actually immaterial. Even if the hashes weren't actually
randomized, you shouldn't be making assumptions about anything
specific in the hash, save that *within one Python process*, two equal
values will have equal hashes (and therefore two objects with unequal
hashes will not be equal).

Entirely agree - I was just trying to get to the bottom of thedifference - especially considering that the documentation I could findimplied that all hash algorithms had been randomized.

//I did suggest strongly to the original questioner that relying on the same
hash value across different platforms wasn't a clever solution - their
original plan was to store hash values in a cross system database to enable
quick retrieval of data (!!!). I did remind the OP that a hash value wasn't
guaranteed to be unique anyway - and they might come across two different
values with the same hash - and no way to distinguish between them if all
they have is the hash. Hopefully their revised design will store the key,
not the hash./

Uhh.... if you're using a database, let the database do the work of
being a database. I don't know what this "cross system database" would
be implemented in, but if it's a proper multi-user relational database
engine like PostgreSQL, it's already going to have way better indexing
than anything you'd do manually. I think there are WAY better
solutions than worrying about Python's inbuilt hashing.

Agreed

If you MUST hash your data for sharing and storage, the easiest
solution is to just use a cryptographic hash straight out of
hashlib.py.

As stated before - I think the original questioner was intent on microoptimizations - and they had hit on the idea that storing an integerwould be quicker than storing as string - entirely ignoring both thepracticality of trying to code all strings into a value (since hashesaren't guaranteed not to collide), and the issues of trying to reversethat translation once the stored key had been retrieved.

ChrisA
_______________________________________________
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/anthony.flury%40btinternet.com


Thanks for your comments :-)

--
--
Anthony Flury
email : *[email protected]*
Twitter : *@TonyFlury <https://twitter.com/TonyFlury/>*

_______________________________________________
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Hashes in Python3.5 for tuples and frozensets

Reply via email to