Martin McClure wrote:
Gotta go see my wife in a play now; will comment on these graphs later.
The play was fun :-) Now about the graphs:The test code, a variant of the test from the HashTable SqueakSource wiki, is at the bottom of this message. Basically, it adds a bunch of instances of Object to a Dictionary and sees how much time it takes to look them up again.
From the graphs in the previous message, you can see that performance for sizes > 4000 is greatly improved. For size = 10000, #at: is 1000 times faster, 2-3 microseconds vs. >2 milliseconds. At size=10000, #at:put: is about 200 times faster, ~10 microseconds vs. >2 milliseconds, and the large spikes for growing the collection are 21 milliseconds vs. > 4 seconds, again a factor of about 200.
Performance for dictionary sizes < 4000 is essentially the same as before, so these collections can serve as general-purpose collections over a wide range of sizes. I've attached the graphs for sizes <4000 to this message so you can see that more clearly than on the previous graphs.
These results should hold for any object that inherits #hash from Object, in other words uses its identity hash as its equality hash. Other objects with better hashes did not have as serious a problem, but will probably show increased performance as well due to using prime table sizes.
These changes are in Set, so should improve Set's subclasses as well. IdentityDictionary, IdentitySet, and WeakIdentityKeyDictionary did not have as serious a problem, but should see some improvement. MethodDictionaries have been left alone on the assumption that the VM depends on the hashing of those.
Since there are still only 4K possible values of identity hash, collisions are inevitable in large collections, and the number of collisions will grow linearly with collection size. So how well does the spread hash / prime table size do at even larger sizes? I ran the same test at sizes of one million. As expected, access was quite a bit slower than it had been at 10000. Time for #at: was ~250 microseconds, and #at:put: was about the same. Note, however, that this is still ten times faster than Dictionary previously was at a size of 10000; 100 times larger yet 10 times faster.
I had a lot of fun doing this. This is better results than I expected, for a fairly minor (though deep) code change.
Regards, -Martin | test ord | Transcript cr. test := Dictionary new. [ test size >= 10000] whileFalse: [ ord := OrderedCollection new: 100. Transcript show: [ 100 timesRepeat: [ test at: ( ord add: Object new ) put: nil ]. ] timeToRun asString. Transcript tab; show: test size asString; tab. Transcript show: [ 1000 timesRepeat: [ ord do: [ :each | test at: each ] ] ] timeToRun asString. Transcript tab; show: [ 1000 timesRepeat: [ ord do: [ :each | ] ] ] timeToRun asString. Transcript cr ]
<<inline: SmallDictPerformanceGraphs1.png>>
_______________________________________________ Pharo-project mailing list [email protected] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
