On Mon, Jul 27, 2020 at 5:42 PM Ethan Furman <et...@stoneleaf.us> wrote: > Chris Barker wrote:
> > Is this because it's possible, if very > > unlikely, for ANY hash algorithm to create the same hash for two > > different inputs? So equality always has to be checked anyway? > > snip For example, if a hash algorithm decided to use short names, then a > group of people might be sorted like this: > > Bob: Bob, Robert > Chris: Christopher, Christine, Christian, Christina > Ed: Edmund, Edward, Edwin, Edwina > > So if somebody draws a name from a hat: > > Christina > > You apply the hash to it: > > Chris > > Ignore the Bob and Ed buckets, then use equality checks on the Chris > names to find the right one. > sure, but know (or assume anyway) that python dicts and sets don't use such a simple, naive hash algorithm, so in fact, non-equal strings are very unlikely to have the same hash: In [42]: hash("Christina") Out[42]: -8424898463413304204 In [43]: hash("Christopher") Out[43]: 4404166401429815751 In [44]: hash("Christian") Out[44]: 1032502133450913307 But a dict always has a LOT fewer buckets than possible hash values, so clashes within a bucket are not so rare, so equality needs to be checked always -- which is what I was missing. And while it wouldn't break anything, having a bunch of non-equal objects produce the same hash wouldn't break anything, it would break the O(1) performance of dicts. Have I got that right? -CHB > >> From a practical standpoint, think of dictionaries: > > > > (that's the trick here -- you can't "get" this without knowing something > > about the implementation details of dicts.) > > Depends on the person -- I always do better with a concrete application. > > >> adding > >> ------ > >> - objects are sorted into buckets based on their hash > >> - any one bucket can have several items with equal hashes > > > > is this mostly because there are many more possible hashes than buckets? > > Yes. > > >> - those several items (obviously) will not compare equal > > > > So the hash is a fast way to put stuff in buckets, so you only need to > > compare with the others that end up in the same bucket? > > Yes. > > >> retrieving > >> ---------- > >> - get the hash of the object > >> - find the bucket that would hold that hash > >> - find the already stored objects with the same hash > >> - use __eq__ on each one to find the match > > > > So here's my question: if there is only one object in that bucket, is > > __eq__ checked anyway? > > Yes -- just because it has the same hash does not mean it's equal. > > > So what happens when there is no __eq__?The object can still be hashable > > -- I guess that's because there IS an __eq__ -- it defaults to an id > > check, yes? > > Yes. > > The default hash, I believe, also defaults to the object id -- so, by > default, objects are hashable and compare equal only to themselves. > > -- > ~Ethan~ > _______________________________________________ > Python-ideas mailing list -- python-ideas@python.org > To unsubscribe send an email to python-ideas-le...@python.org > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > Message archived at > https://mail.python.org/archives/list/python-ideas@python.org/message/XPUXOSK7WHXV7LRB7H3I4S42JQ2WXQU3/ > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/YJNU4QPGWJUFYIL33PSI2NPYAC2BRLI5/ Code of Conduct: http://python.org/psf/codeofconduct/