> On 17 Dec 2021, at 15:28, Chris Angelico <ros...@gmail.com> wrote: > > On Sat, Dec 18, 2021 at 1:21 AM Hao Hu <hao.hu...@gmail.com> wrote: >> >> Hi, >> >> I am wondering if it would be good to add an additional keyword `seed` to >> the builtin function *hash* to allow us to set arbitrary seed to ensure >> reproducible results. >> > > The built-in hash() function is extremely generic, so it can't really > work that way. Adding a parameter to it would require (a) adding the > parameter to every __hash__ method of every object, including > user-defined objects; and (b) defining what that would mean when > multiple objects' hashes are combined (eg hashing a tuple). >
I would not say the opposite, however maybe it appears to be more complicated than it is really is. Probably it is worth a small analysis? >> >> As far as I know, there exists already the environment variable >> PYTHONHASHSEED that allows us to set arbitrary seed or disable the seed >> globally for the python interpreter. >> However, it looks like that it would be too bold to use that environment >> variable to change the default behavior because the random seed generation >> helps improve the security my reducing the risk of hash flooding. >> >> In parallel, we have identified a couple of real use cases that require that >> an arbitrary seed is used for a limited scope. >> For instance, if we create a caching programming interface that relies on a >> distributed kv store, it would be very important to make sure that the hash >> key stays the same when the application is rebooted or replicated. It is >> generally more cautious to use the above capability to limit the scope to >> the caching library itself instead of applying the same for all the hash >> functions of all the python interpreters. >> > > For that sort of thing, it may be more practical to use your own > hashing function, possibly a cryptographically secure one. The precise > hashing function used by Python isn't guaranteed, so if you need it to > be stable across different runs, and especially if you need to seed it > in a specific way, I'd recommend hashlib: > > https://docs.python.org/3/library/hashlib.html I’ve explored that option, however the siphash24 or fnv under the hood of *hash* seems to be more adapted for this type of use cases in terms of *performance*. Otherwise, would that be useful to add siphash24 or fnv in the hashlib as well? There are obviously also other third party libraries such as *mmh*, however that’ll introduce unnecessary immature dependencies. WDYT? Thank you. > > ChrisA > _______________________________________________ > Python-ideas mailing list -- python-ideas@python.org > To unsubscribe send an email to python-ideas-le...@python.org > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > Message archived at > https://mail.python.org/archives/list/python-ideas@python.org/message/CU4TI3AEMI7Y6USIIHDWWSQW7WGGPNJ7/ > Code of Conduct: http://python.org/psf/codeofconduct/ _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/HNMTY3SSKLAYNCLC5CN5SPUOMZP7IYGJ/ Code of Conduct: http://python.org/psf/codeofconduct/