On Sun, Nov 27, 2022 at 11:36 AM Yoni Lavi <yoni.lav...@gmail.com> wrote:

> I wrote a doc stating my case here:
>
> https://docs.google.com/document/d/1et5x5HckTJhUQsz2lcC1avQrgDufXFnHMin7GlI5XPI/edit#
>
> Briefly,
>
> 1. The main motivation for it is to allow users to get a predictable
> result on a given input (for programs that are doing pure compute, in
> domains like operations research / compilation), any time they run their
> program. Having stable repro is important for debugging. Notebooks with
> statistical analysis are another similar case where this is needed: you
> might want other people to run your notebook and get the same result you
> did.
>

But the hash of an object is not guaranteed to be stable by the language,
so I would argue someone expecting that is expected to convert
random-access data structures  to ones that are consistent when necessary
(e.g. sorted lists).


>
> 2. The reason the hash non-determinism of None matters in practice is that
> it can infect commonly used mapping key types, such as frozen dataclasses
> containing `Optional[int]` fields.
>

I don't see why the hashing within a dict needs to be consistent as that's
not a guarantee we make with Python.


>
> 3. Non-determinism emerging from other value types like `str` can be
> disabled by the user using `PYTHONHASHSEED`, but there's no such protection
> against `None`.
>

If I remember correctly, PYTHONHASHSEED was added to help folks migrate
when we added randomness to hashing as they had accidentally come to expect
a consistent iteration order on dictionary keys. I wouldn't take its
existence to suggest that PYTHONHASHSEED is meant to make **all** hashing
consistent (e.g. people who implement their own __hash__ don't have to
follow that expectation).


>
> All it takes is for your program to compute a set somewhere with affected
> keys, and iterate on it - and determinism is lost.
>

That's actually by design. Sets are not meant to be deterministic
conceptually as they are essentially a bag of stuff. If you want
deterministic ordering you should convert it to a list and sort the list.


>
> The need to modify None itself is caused by two factors
> - `Optional` being implemented effectively as `T | None` in Python as a
> strongly established practice
> - The fact that `__hash__` is an intrinsic property of a type in Python,
> the hashing function cannot be externally supplied to its builtin container
> types. So we have to modify the type None itself, rather than write some
> alternative hasher that we could use if we care about deterministic
> behavior across runs.
>
> This was debated at length over the forum and in discord.
> I also posted a PR for it, and it was closed, see:
>
> https://github.com/python/cpython/issues/99540
> https://github.com/python/cpython/pull/99541
>
> Asking for opinions, and to re-open the PR, provided there is enough
> support for such a change to take place.
>

I personally agree with the arguments made in the issue, so I'm afraid I
don't' support making the change as we worked hard to stop people from
relying on consistent hashing/iteration from random-access data structures
like dict and set.

-Brett


>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/KUH4HZYKPBO57A73QKCGU4GD2JNY3VMH/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/25XFRWUOUREKKY6GUIOQIIRFBNI34MNZ/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to