Re: [Pharo-project] Hashed collection changes, the performance graphs

Stéphane Ducasse Sat, 31 Oct 2009 05:11:20 -0700

What I can tell you is that I ***loves*** this discussion.
It illustrates the spirit of pharo that we want to push.
Let's make a better and cooler smalltalk :)
And people are even learning nice knowledge.
Thanks


On Oct 29, 2009, at 3:14 AM, Andres Valloud wrote:

> Martin,
>
> One of the constituencies I thought of when I decided to leave
> identityHash alone was folks like you.  Now, as a representative, if  
> you
> are ok with dealing with broken identityHash senders (which I hope  
> will
> be few), then most of my motivation for leaving identityHash unchanged
> is gone.  Thus, I would not mind changing identityHash and  
> implementing
> primIdentityHash.
>
> What about others?  Would anybody mind if identityHash was changed?
>
> Some comments below...
>
>> I took a survey of the senders of #identityHash in the latest web  
>> image.
>> There aren't that many. The largest category is those that want the
>> printString of the identityHash.
>>
>
> These would probably need to be changed to get the printString of the
> primIdentityHash.
>
>> Of those that care about the value of the identityHash, there are
>> several that use it in #hash methods. The most common is this  
>> definition:
>>
>> hash
>>   ^self identityHash
>>
>> These are presumably overriding superclass behavior to restore Object
>> behavior.
>
> I'd like to take a look at these, I suspect there may be low hanging
> fruit waiting to be fixed.
>
>> If the authors knew about the limited range of #identityHash, that is
>> entirely possible. I tend to think it more likely that in most cases
>> these implementations are just the simplest way to follow the dictate
>> that 'a=b -> a hash = b hash', and that they didn't really think  
>> about
>> the impact on collection performance.
>>
>
> Or maybe they chose identityHash because they can assume uniqueness (=
> effectively being ==)...
>
>> 5 improved, 2 harmed. And one of the listed harmed is  
>> MethodDictionary,
>> whose performance would not be harmed, but I assume the VM would  
>> not be
>> happy if their hashing was changed (anybody know for sure whether  
>> that's
>> true?)
>>
>
> The VM probably knows a lot about identityHash values, and most likely
> uses the primIdentityHash values because then it doesn't have to shift
> on access.
>
>> They could, and I admit to having written this kind of code in the  
>> past,
>> but I doubt that I'm typical in doing so. Do you know of any Pharo  
>> code
>> that actually *does* this sort of thing? There isn't any in the
>> distributed web image, but I didn't look at every package that is  
>> meant
>> to be loadable in Pharo.
>>
>
> I might suspect that Magma does this kind of stuff... but that's  
> just a
> guess.  I didn't immediately see any code doing so.  As long as  
> package
> maintainers are fine with two quite different versions of Pharo with
> very different identityHash method behaviors, then I do not have a  
> problem.
>
>>> Clever hacks such as
>>>
>>> SomeObject>>hash
>>>
>>>  ^(self variableA identityHash bitShift: 12) + self variableB  
>>> identityHash
>>>
>>>
>>> would also remain undisturbed.
>>>
>>
>> Yes, if #identityHash is changed it's the clever hacks that will  
>> have to
>> change. This could be a disadvantage of this approach, but often,  
>> as in
>> the case of IdentityDictionary, IdentitySet, and
>> WeakIdentityKeyDictionary, the necessary change is simply to remove  
>> the
>> clever hack, get simpler code, and enjoy better performance than  
>> you got
>> with the clever hack, so making the change is IMO an improvement.
>>
>
> We agree, mod I wouldn't want to impose version maintenance homework  
> on
> maintainers of large packages.  For the sake of illustration only, and
> using Magma without knowing if it would be affected, I wouldn't want
> whoever is maintaining Magma to maintain two branches... one for Pharo
> 1.xyz, and one for Pharo 1.xyz++.
>
>>> Finally, I do not know of any Smalltalk
>>> in which identityHash does not answer the actual object header  
>>> bits for
>>> the identity hash. If we change identityHash, then AFAIK Pharo would
>>> become the only Smalltalk in which identityHash does not answer
>>> consecutive values between 0 and (2^k)-1 (k=12 for Squeak/Pharo,  
>>> k=14
>>> for VisualWorks 32 bits, k=20 for VisualWorks 64 bits, IIRC k=15  
>>> for VA
>>> and VisualSmalltalk).
>>>
>>
>> GemStone is a Smalltalk that does not answer consecutive values for
>> identityHash.
>
> Haha, I was thinking of "regular" image based Smalltalks...
>
>> In GemStone the identityHash is computed from the object's
>> OOP, and OOPs are not consecutive.
>
> Not necessarily, although I suspect identityHash values map to an
> integer interval along the lines of [0, 2^40-1].  So, if you look at
> hash(x) as a function, the image of hash(x) is a set of consecutive
> intervals.  Using bitShift: to scale identityHash values would make  
> the
> image of hash(x) sparse (with the exception of small integers,
> characters and, to some extent in VW 64 bit, small doubles).
>
>> And Smalltalk-80 basically used the
>> same scheme, though you could only have 32K objects, every one had a
>> different identityHash based on OOP.
>>
>
> These are also consecutive values... [0, 2^15-1], basically.
>
>> Also, most (all?) Smalltalks with limited ranges for identityHash do
>> have a larger range of identityHash for SmallIntegers (usually  
>> ^self),
>> so you can't use the clever hacks if you might have any  
>> SmallIntegers in
>> your collection. So any general-purpose collection must already deal
>> with the full SmallInteger range of identity hashes as keys, cannot  
>> use
>> the clever hacks, and so is likely to only be improved by changing
>> #identityHash. This is a key point that I forgot to bring up last  
>> night.
>>
>
> Well, more or less, because with scaledIdentityHash you'd need to
> implement it in SmallInteger as ^self... but yes, I think hashed
> collections shouldn't be put into a position where they judge what's a
> good hash value and what isn't (and  spend CPU time doing so at
> runtime!!!).  Java does this, and as far as I could see back when I
> studied Java's hashing implementation, IMO it's not a good idea.
>
> Andres.
>
> _______________________________________________
> Pharo-project mailing list
> [email protected]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Re: [Pharo-project] Hashed collection changes, the performance graphs

Reply via email to