On 12/18/21 08:44, Stephen J. Turnbull wrote:

Hao Hu writes:
  > > On 17 Dec 2021, at 15:28, Chris Angelico <ros...@gmail.com> wrote:

  > > The built-in hash() function is extremely generic, so it can't really
  > > work that way. Adding a parameter to it would require (a) adding the
  > > parameter to every __hash__ method of every object, including
  > > user-defined objects;
  >
  > I would not say the opposite, however maybe it appears to be more
  > complicated than it is really is. Probably it is worth a small
  > analysis?

It's the user-defined objects that are the killer here.  We don't want
to go wrecking dozens of projects' objects.

  > >> For instance, if we create a caching programming interface that
  > >> relies on a distributed kv store,

I would be very suspicious of using Python's hash builtin for such a
purpose.  The Python hash functions are very carefully tuned for high
performance in one application only: equality testing in Python,
especially for dicts.  Many __hash__ methods omit much of the object
being hashed; if the variation in your keys depends only on those
attributes, you'll get a lot of collisions.  Others are extremely
predictable.  E.g., most integers and other numbers equal to integers
hash to themselves mod 2**61 - 1, I believe -1 is only exception.
Being predictable as such may not be a problem for your kv store
cache, but predictable == pattern, and if your application happens to
match that pattern, you could again end up with a massive collision
problem.  I imagine this is much less likely to be a problem than the
case where keys depend on omitted attributes, since presumably the
__hash__ method is designed to cover the whole range.  And numbers are
the only case I know of offhand.

It is pretty much the same use case as python's dictionary though, the goal is just to generalize it to use with a distributed kv store. Another big advantage is that it is more user friendly to apply *hash* directly on a type.


  > > I'd recommend hashlib:

+1

  > Otherwise, would that be useful to add siphash24 or fnv in the
  > hashlib as well?

I think that is a good idea.  To me, it seems relatively likely to be
accepted quickly.  However, many cryptographic algorithms are delicate
(eg, to avoid timing attacks), so I could be wrong about that.  Folks
like Christian Heimes might be very concerned about the implementation
as well as the algorithm.

Note that Python/pyhash.c seems to have implementations of both of
these algorithms, although I don't know if these implementations
satisfy cryptographic needs.

According to the doc, there seems to be 2 categories of hash function. One is for cryptographic purpose, another one is for message authentication code.

The algorithms mentioned above could be mostly put into the second category.

Steve

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FBYX6MPTZGQUPQICYGYOPMLGAELUVF2H/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to