On 12/18/21 08:44, Stephen J. Turnbull wrote:
Hao Hu writes: > > On 17 Dec 2021, at 15:28, Chris Angelico <ros...@gmail.com> wrote:> > The built-in hash() function is extremely generic, so it can't really > > work that way. Adding a parameter to it would require (a) adding the > > parameter to every __hash__ method of every object, including > > user-defined objects; > > I would not say the opposite, however maybe it appears to be more > complicated than it is really is. Probably it is worth a small > analysis? It's the user-defined objects that are the killer here. We don't want to go wrecking dozens of projects' objects. > >> For instance, if we create a caching programming interface that > >> relies on a distributed kv store, I would be very suspicious of using Python's hash builtin for such a purpose. The Python hash functions are very carefully tuned for high performance in one application only: equality testing in Python, especially for dicts. Many __hash__ methods omit much of the object being hashed; if the variation in your keys depends only on those attributes, you'll get a lot of collisions. Others are extremely predictable. E.g., most integers and other numbers equal to integers hash to themselves mod 2**61 - 1, I believe -1 is only exception. Being predictable as such may not be a problem for your kv store cache, but predictable == pattern, and if your application happens to match that pattern, you could again end up with a massive collision problem. I imagine this is much less likely to be a problem than the case where keys depend on omitted attributes, since presumably the __hash__ method is designed to cover the whole range. And numbers are the only case I know of offhand.
It is pretty much the same use case as python's dictionary though, the goal is just to generalize it to use with a distributed kv store. Another big advantage is that it is more user friendly to apply *hash* directly on a type.
> > I'd recommend hashlib: +1 > Otherwise, would that be useful to add siphash24 or fnv in the > hashlib as well? I think that is a good idea. To me, it seems relatively likely to be accepted quickly. However, many cryptographic algorithms are delicate (eg, to avoid timing attacks), so I could be wrong about that. Folks like Christian Heimes might be very concerned about the implementation as well as the algorithm. Note that Python/pyhash.c seems to have implementations of both of these algorithms, although I don't know if these implementations satisfy cryptographic needs.
According to the doc, there seems to be 2 categories of hash function. One is for cryptographic purpose, another one is for message authentication code.
The algorithms mentioned above could be mostly put into the second category.
Steve
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/FBYX6MPTZGQUPQICYGYOPMLGAELUVF2H/ Code of Conduct: http://python.org/psf/codeofconduct/