> On 17 Dec 2021, at 15:28, Chris Angelico <ros...@gmail.com> wrote:
> 
> On Sat, Dec 18, 2021 at 1:21 AM Hao Hu <hao.hu...@gmail.com> wrote:
>> 
>> Hi,
>> 
>> I am wondering if it would be good to add an additional keyword `seed` to 
>> the builtin function *hash* to allow us to set arbitrary seed to ensure 
>> reproducible results.
>> 
> 
> The built-in hash() function is extremely generic, so it can't really
> work that way. Adding a parameter to it would require (a) adding the
> parameter to every __hash__ method of every object, including
> user-defined objects; and (b) defining what that would mean when
> multiple objects' hashes are combined (eg hashing a tuple).
> 

I would not say the opposite, however maybe it appears to be more complicated 
than it is really is. Probably it is worth a small analysis?

>> 
>> As far as I know, there exists already the environment variable 
>> PYTHONHASHSEED that allows us to set arbitrary seed or disable the seed 
>> globally for the python interpreter.
>> However, it looks like that it would be too bold to use that environment 
>> variable to change the default behavior because the random seed generation 
>> helps improve the security my reducing the risk of hash flooding.
>> 
>> In parallel, we have identified a couple of real use cases that require that 
>> an arbitrary seed is used for a limited scope.
>> For instance, if we create a caching programming interface that relies on a 
>> distributed kv store, it would be very important to make sure that the hash 
>> key stays the same when the application is rebooted or replicated. It is 
>> generally more cautious to use the above capability to limit the scope to 
>> the caching library itself instead of applying the same for all the hash 
>> functions of all the python interpreters.
>> 
> 
> For that sort of thing, it may be more practical to use your own
> hashing function, possibly a cryptographically secure one. The precise
> hashing function used by Python isn't guaranteed, so if you need it to
> be stable across different runs, and especially if you need to seed it
> in a specific way, I'd recommend hashlib:
> 
> https://docs.python.org/3/library/hashlib.html

I’ve explored that option, however the siphash24 or fnv under the hood of 
*hash* seems to be more adapted for this type of use cases in terms of 
*performance*.
Otherwise, would that be useful to add siphash24 or fnv in the hashlib as well?
There are obviously also other third party libraries such as *mmh*, however 
that’ll introduce unnecessary immature dependencies.

WDYT? 
Thank you.

> 
> ChrisA
> _______________________________________________
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-ideas@python.org/message/CU4TI3AEMI7Y6USIIHDWWSQW7WGGPNJ7/
> Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/HNMTY3SSKLAYNCLC5CN5SPUOMZP7IYGJ/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to