Re: [Pharo-project] Hashed collection changes, the performance graphs

Stéphane Ducasse Wed, 28 Oct 2009 13:40:36 -0700

Martin

so what you are saying is that we can gain speed. And that these  
changes are beneficial for simple default.



> The test code, a variant of the test from the HashTable SqueakSource  
> wiki, is at the bottom of this message. Basically, it adds a bunch  
> of instances of Object to a Dictionary and sees how much time it  
> takes to look them up again.
>
> From the graphs in the previous message, you can see that  
> performance for sizes > 4000 is greatly improved. For size = 10000,  
> #at: is 1000 times faster, 2-3 microseconds vs. >2 milliseconds. At  
> size=10000, #at:put: is about 200 times faster, ~10 microseconds vs.  
> >2 milliseconds, and the large spikes for growing the collection are  
> 21 milliseconds vs. > 4 seconds, again a factor of about 200.
>
> Performance for dictionary sizes < 4000 is essentially the same as  
> before, so these collections can serve as general-purpose  
> collections over a wide range of sizes. I've attached the graphs for  
> sizes <4000 to this message so you can see that more clearly than on  
> the previous graphs.
>
> These results should hold for any object that inherits #hash from  
> Object, in other words uses its identity hash as its equality hash.  
> Other objects with better hashes did not have as serious a problem,  
> but will probably show increased performance as well due to using  
> prime table sizes.
>
> These changes are in Set, so should improve Set's subclasses as  
> well. IdentityDictionary, IdentitySet, and WeakIdentityKeyDictionary  
> did not have as serious a problem, but should see some improvement.  
> MethodDictionaries have been left alone on the assumption that the  
> VM depends on the hashing of those.

So where are these changes :)
Should we integrate them?
Andres and other hash freaks what are your point of view?

> Since there are still only 4K possible values of identity hash,  
> collisions are inevitable in large collections, and the number of  
> collisions will grow linearly with collection size. So how well does  
> the spread hash / prime table size do at even larger sizes? I ran  
> the same test at sizes of one million. As expected, access was quite  
> a bit slower than it had been at 10000. Time for #at: was ~250  
> microseconds, and #at:put: was about the same. Note, however, that  
> this is still ten times faster than Dictionary previously was at a  
> size of 10000; 100 times larger yet 10 times faster.
>
> I had a lot of fun doing this. This is better results than I  
> expected, for a fairly minor (though deep) code change.

So where are these changes :)
Should we integrate them?
Andres and other hash freaks what are your point of view?


> | test ord |
> Transcript cr.
> test := Dictionary new.
> [ test size >= 10000] whileFalse: [
> ord := OrderedCollection new: 100.
> Transcript show:
> [
> 100 timesRepeat: [
> test at: ( ord add: Object new ) put: nil ].
> ] timeToRun asString.
> Transcript tab;
> show: test size asString;
> tab.
> Transcript show:
> [
> 1000 timesRepeat: [
> ord do: [ :each | test at: each ] ]
> ] timeToRun asString.
> Transcript tab; show:
> [
> 1000 timesRepeat: [
> ord do: [ :each | ] ]
> ] timeToRun asString.
> Transcript cr ]
>
> < 
> SmallDictPerformanceGraphs1 
> .png>_______________________________________________
> Pharo-project mailing list
> [email protected]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Re: [Pharo-project] Hashed collection changes, the performance graphs

Reply via email to