[sqlalchemy] Re: Python 2.6 hash behavior change

Michael Bayer Tue, 04 Mar 2008 08:30:41 -0800


On Mar 4, 2008, at 4:26 AM, Denis S. Otkidach wrote:

>
> On Mon, Mar 3, 2008 at 8:23 PM, Michael Bayer <[EMAIL PROTECTED] 
> > wrote:
>> We define __eq__() all over the place so that would be a lot of
>> __hash__() methods to add, all of which return id(self).  I wonder if
>> we shouldn't just make a util.Mixin called "Hashable" so that we can
>> centralize the idea.
>
> Are you sure this is a correct way? Below is an example demonstrating
> the problem with it:
>
>>>> class C(object):
> ...     def __init__(self, value):
> ...         self._value = value
> ...     def __eq__(self, other):
> ...         return self._value==other._value
> ...     def __hash__(self):
> ...         return id(self)
> ...
>>>> c1 = C(1)
>>>> c2 = C(1)
>>>> c1==c2
> True
>>>> d = {c1: None}
>>>> c1 in d
> True
>>>> c2 in d
> False
>
> I.e. although c2 is equal to c1 and thus should be found in
> dictionary, it is not. The defined __hash__ method must return equal
> numbers for equal object.
>

Well actually, in our particular case that's the behavior that we *do*  
want; pretty much everywhere we've defined __eq__(), we've done it not  
to redefine what it means for a==b, but to produce SQL expressions -  
so in that sense __eq__() is entirely broken for its normal usage in  
SQLAlchemy (as well as in all the other SQL tools out there using this  
approach).  For this reason, internally we can't do things like "d in  
[a,b,c]" if those are SQL expressions, since __eq__() evaluates to  
true in all cases - we use sets when we need a collection of SQL  
expressions where we can test for presence, so that their hash value  
is used.

However, while Im not familiar with the internals of Python  
dictionaries, depending on how they implemented it we still may need  
to use IdentitySet and IdentityDict, two classes (well we have the  
first one at least) which ignore the __hash__() and __eq__() methods  
entirely and hash their contents strictly based on id(obj).   This is  
because a "hashtable" usually stores items in buckets based on a  
modulus of the __hash__() value; if two items are in the same bucket,  
an equality comparison is used to locate the correct object.  If  
Python's dict uses __eq__() for the equality comparison, we'd be in  
trouble.  I have a strong suspicion that they do not (since I think we  
would have noticed by now), and that they use __hash__() for the  
equality comparison as well, but I'm not sure; and also not sure if  
this is slated to change in py2.6.

I think I might want to look into defining in util  ExpressionSet /  
ExpressionKeyDict set symbols (subject to the new names jek is sure to  
propose... ;)  ) which would be used throughout the source code to  
store SQL expression constructs as keys.  That way at least we can  
change the underlying implementation based on observed quirks of the  
version in use.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

[sqlalchemy] Re: Python 2.6 hash behavior change

Reply via email to