There's a number of Core devs that have taken strong positions against this 
change, citing various reasons ranging from "the addition of a function that 
returns a constant will cause bloat in the interpreter / needs to be tested / 
etc" to "what you really mean to ask for is set iteration stability, and we 
don't want that" to "identity based hashing is the default correct choice of a 
hashing function to use in any situation, unless we are forced by the 
requirements not to (even if it's disadvantageous compared to other choices)" 
to just straight appeals to authority ("rhettinger closed the issue on github 
so he must have done it for a good reason).

I'm not sure if they actually believe what they say in all of these cases. To 
me, it sounds more like "please go away" than an honest argument on technical 
merit, but it matters little. 
I don't think anything can be changed with further technical discussion.

---

I do have another suggestion that I think merits a discussion. Maybe it will 
fare better. This change has a bit broader scope.

What if we were to subtract some statically allocated “anchor” address from the 
pointer in _Py_HashPointerRaw and the id function?

It’s arguably a security fix, since these operations currently leak the ASLR 
offset, and after that they won’t. It also makes the hashes of statically 
allocated PyObjects with defaulted tp_hash stable per build of Python, which I 
think is a good thing for reasons we’ve already discussed at great length.

There is a downside to this suggestion that it adds one integer subtraction to 
each of these functions.

If this tiny perf cost is a concern, we could even disable this countermeasure 
if Python can determine it was guaranteed to load to a static memory location.

At least two core devs responded with "don't care" / “it works on my machine” 
because they happen to have ASLR disabled. The current situation ties together 
two completely separate concerns, and adds a non-portable aspect to the 
behavior of the runtime - you can write a program that behaves 
deterministically on system A and then see non-deterministic behavior on system 
B. I don’t think I should have to explain why this is bad. 

Regarding language requirements, nothing changes.

It is a per-interpreter specific change, since not all id and hash 
implementations depend on the object’s memory location (also since some runtime 
environments, like JVM, cannot be attacked with out of bounds memory accesses 
from inside the program, so an ASLR offset leak might not be deemed a risk 
there). At most, it is an advisory that those who do should act similarly, and 
even that is tenuous at best.

WDYT?

P.S. the other way to implement the security fix is to add a randomly chosen 
64-bit secret (and then you wouldn’t know what part of the “offset” is due to 
ASLR and what’s due to the secret). And at least then, it becomes 
non-deterministic on all systems, as opposed to just some of them.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/JGG2LOTJEFXLLMNEMNHT7CHOUSNZ5KZX/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to