On Mon, Jul 27, 2020 at 5:42 PM Ethan Furman <et...@stoneleaf.us> wrote:
> Chris Barker wrote:

> >  Is this because it's possible, if very
> > unlikely, for ANY hash algorithm to create the same hash for two
> > different inputs? So equality always has to be checked anyway?
>
> snip

For example, if a hash algorithm decided to use short names, then a
> group of people might be sorted like this:
>
> Bob: Bob, Robert
> Chris: Christopher, Christine, Christian, Christina
> Ed: Edmund, Edward, Edwin, Edwina
>
> So if somebody draws a name from a hat:
>
>    Christina
>
> You apply the hash to it:
>
>    Chris
>
> Ignore the Bob and Ed buckets, then use equality checks on the Chris
> names to find the right one.
>

sure, but know (or assume anyway) that python dicts and sets don't use such
a simple, naive hash algorithm, so in fact, non-equal strings are very
unlikely to have the same hash:

In [42]: hash("Christina")

Out[42]: -8424898463413304204

In [43]: hash("Christopher")

Out[43]: 4404166401429815751

In [44]: hash("Christian")

Out[44]: 1032502133450913307

But a dict always has a LOT fewer buckets than possible hash values, so
clashes within a bucket are not so rare, so equality needs to be checked
always -- which is what I was missing.

And while it wouldn't break anything, having a bunch of non-equal objects
produce the same hash wouldn't break anything, it would break the O(1)
performance of dicts.

Have I got that right?

-CHB







> >> From a practical standpoint, think of dictionaries:
> >
> > (that's the trick here -- you can't "get" this without knowing something
> > about the implementation details of dicts.)
>
> Depends on the person -- I always do better with a concrete application.
>
> >> adding
> >> ------
> >> - objects are sorted into buckets based on their hash
> >> - any one bucket can have several items with equal hashes
> >
> > is this mostly because there are many more possible hashes than buckets?
>
> Yes.
>
> >> - those several items (obviously) will not compare equal
> >
> > So the hash is a fast way to put stuff in buckets, so you only need to
> > compare with the others that end up in the same bucket?
>
> Yes.
>
> >> retrieving
> >> ----------
> >> - get the hash of the object
> >> - find the bucket that would hold that hash
> >> - find the already stored objects with the same hash
> >> - use __eq__ on each one to find the match
> >
> > So here's my question: if there is only one object in that bucket, is
> > __eq__ checked anyway?
>
> Yes -- just because it has the same hash does not mean it's equal.
>
> > So what happens when there is no __eq__?The object can still be hashable
> > -- I guess that's because there IS an __eq__ -- it defaults to an id
> > check, yes?
>
> Yes.
>
> The default hash, I believe, also defaults to the object id -- so, by
> default, objects are hashable and compare equal only to themselves.
>
> --
> ~Ethan~
> _______________________________________________
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/XPUXOSK7WHXV7LRB7H3I4S42JQ2WXQU3/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/YJNU4QPGWJUFYIL33PSI2NPYAC2BRLI5/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to