Re: [Pharo-project] Hashed collection changes, the performance graphs

Martin McClure Fri, 23 Oct 2009 23:34:32 -0700

Martin McClure wrote:

Gotta go see my wife in a play now; will comment on these graphs later.


The play was fun :-)

Now about the graphs:

The test code, a variant of the test from the HashTable SqueakSourcewiki, is at the bottom of this message. Basically, it adds a bunch ofinstances of Object to a Dictionary and sees how much time it takes tolook them up again.

From the graphs in the previous message, you can see that performancefor sizes > 4000 is greatly improved. For size = 10000, #at: is 1000times faster, 2-3 microseconds vs. >2 milliseconds. At size=10000,#at:put: is about 200 times faster, ~10 microseconds vs. >2milliseconds, and the large spikes for growing the collection are 21milliseconds vs. > 4 seconds, again a factor of about 200.

Performance for dictionary sizes < 4000 is essentially the same asbefore, so these collections can serve as general-purpose collectionsover a wide range of sizes. I've attached the graphs for sizes <4000 tothis message so you can see that more clearly than on the previous graphs.

These results should hold for any object that inherits #hash fromObject, in other words uses its identity hash as its equality hash.Other objects with better hashes did not have as serious a problem, butwill probably show increased performance as well due to using primetable sizes.

These changes are in Set, so should improve Set's subclasses as well.IdentityDictionary, IdentitySet, and WeakIdentityKeyDictionary did nothave as serious a problem, but should see some improvement.MethodDictionaries have been left alone on the assumption that the VMdepends on the hashing of those.

Since there are still only 4K possible values of identity hash,collisions are inevitable in large collections, and the number ofcollisions will grow linearly with collection size. So how well does thespread hash / prime table size do at even larger sizes? I ran the sametest at sizes of one million. As expected, access was quite a bit slowerthan it had been at 10000. Time for #at: was ~250 microseconds, and#at:put: was about the same. Note, however, that this is still ten timesfaster than Dictionary previously was at a size of 10000; 100 timeslarger yet 10 times faster.

I had a lot of fun doing this. This is better results than I expected,for a fairly minor (though deep) code change.


Regards,

-Martin



| test ord |
Transcript cr.
test := Dictionary new.
[ test size >= 10000] whileFalse: [
ord := OrderedCollection new: 100.
Transcript show:
[
100 timesRepeat: [
test at: ( ord add: Object new ) put: nil ].
] timeToRun asString.
Transcript tab;
show: test size asString;
tab.
Transcript show:
[
1000 timesRepeat: [
ord do: [ :each | test at: each ] ]
] timeToRun asString.
Transcript tab; show:
[
1000 timesRepeat: [
ord do: [ :each | ] ]
] timeToRun asString.
Transcript cr ]

<<inline: SmallDictPerformanceGraphs1.png>>

_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Re: [Pharo-project] Hashed collection changes, the performance graphs

Reply via email to