On Sun, Nov 27, 2022 at 11:36 AM Yoni Lavi <yoni.lav...@gmail.com> wrote:
> I wrote a doc stating my case here: > > https://docs.google.com/document/d/1et5x5HckTJhUQsz2lcC1avQrgDufXFnHMin7GlI5XPI/edit# > > Briefly, > > 1. The main motivation for it is to allow users to get a predictable > result on a given input (for programs that are doing pure compute, in > domains like operations research / compilation), any time they run their > program. Having stable repro is important for debugging. Notebooks with > statistical analysis are another similar case where this is needed: you > might want other people to run your notebook and get the same result you > did. > But the hash of an object is not guaranteed to be stable by the language, so I would argue someone expecting that is expected to convert random-access data structures to ones that are consistent when necessary (e.g. sorted lists). > > 2. The reason the hash non-determinism of None matters in practice is that > it can infect commonly used mapping key types, such as frozen dataclasses > containing `Optional[int]` fields. > I don't see why the hashing within a dict needs to be consistent as that's not a guarantee we make with Python. > > 3. Non-determinism emerging from other value types like `str` can be > disabled by the user using `PYTHONHASHSEED`, but there's no such protection > against `None`. > If I remember correctly, PYTHONHASHSEED was added to help folks migrate when we added randomness to hashing as they had accidentally come to expect a consistent iteration order on dictionary keys. I wouldn't take its existence to suggest that PYTHONHASHSEED is meant to make **all** hashing consistent (e.g. people who implement their own __hash__ don't have to follow that expectation). > > All it takes is for your program to compute a set somewhere with affected > keys, and iterate on it - and determinism is lost. > That's actually by design. Sets are not meant to be deterministic conceptually as they are essentially a bag of stuff. If you want deterministic ordering you should convert it to a list and sort the list. > > The need to modify None itself is caused by two factors > - `Optional` being implemented effectively as `T | None` in Python as a > strongly established practice > - The fact that `__hash__` is an intrinsic property of a type in Python, > the hashing function cannot be externally supplied to its builtin container > types. So we have to modify the type None itself, rather than write some > alternative hasher that we could use if we care about deterministic > behavior across runs. > > This was debated at length over the forum and in discord. > I also posted a PR for it, and it was closed, see: > > https://github.com/python/cpython/issues/99540 > https://github.com/python/cpython/pull/99541 > > Asking for opinions, and to re-open the PR, provided there is enough > support for such a change to take place. > I personally agree with the arguments made in the issue, so I'm afraid I don't' support making the change as we worked hard to stop people from relying on consistent hashing/iteration from random-access data structures like dict and set. -Brett > > _______________________________________________ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/KUH4HZYKPBO57A73QKCGU4GD2JNY3VMH/ > Code of Conduct: http://python.org/psf/codeofconduct/ >
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/25XFRWUOUREKKY6GUIOQIIRFBNI34MNZ/ Code of Conduct: http://python.org/psf/codeofconduct/