There's a number of Core devs that have taken strong positions against this change, citing various reasons ranging from "the addition of a function that returns a constant will cause bloat in the interpreter / needs to be tested / etc" to "what you really mean to ask for is set iteration stability, and we don't want that" to "identity based hashing is the default correct choice of a hashing function to use in any situation, unless we are forced by the requirements not to (even if it's disadvantageous compared to other choices)" to just straight appeals to authority ("rhettinger closed the issue on github so he must have done it for a good reason).
I'm not sure if they actually believe what they say in all of these cases. To me, it sounds more like "please go away" than an honest argument on technical merit, but it matters little. I don't think anything can be changed with further technical discussion. --- I do have another suggestion that I think merits a discussion. Maybe it will fare better. This change has a bit broader scope. What if we were to subtract some statically allocated “anchor” address from the pointer in _Py_HashPointerRaw and the id function? It’s arguably a security fix, since these operations currently leak the ASLR offset, and after that they won’t. It also makes the hashes of statically allocated PyObjects with defaulted tp_hash stable per build of Python, which I think is a good thing for reasons we’ve already discussed at great length. There is a downside to this suggestion that it adds one integer subtraction to each of these functions. If this tiny perf cost is a concern, we could even disable this countermeasure if Python can determine it was guaranteed to load to a static memory location. At least two core devs responded with "don't care" / “it works on my machine” because they happen to have ASLR disabled. The current situation ties together two completely separate concerns, and adds a non-portable aspect to the behavior of the runtime - you can write a program that behaves deterministically on system A and then see non-deterministic behavior on system B. I don’t think I should have to explain why this is bad. Regarding language requirements, nothing changes. It is a per-interpreter specific change, since not all id and hash implementations depend on the object’s memory location (also since some runtime environments, like JVM, cannot be attacked with out of bounds memory accesses from inside the program, so an ASLR offset leak might not be deemed a risk there). At most, it is an advisory that those who do should act similarly, and even that is tenuous at best. WDYT? P.S. the other way to implement the security fix is to add a randomly chosen 64-bit secret (and then you wouldn’t know what part of the “offset” is due to ASLR and what’s due to the secret). And at least then, it becomes non-deterministic on all systems, as opposed to just some of them. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/JGG2LOTJEFXLLMNEMNHT7CHOUSNZ5KZX/ Code of Conduct: http://python.org/psf/codeofconduct/