Martin so what you are saying is that we can gain speed. And that these changes are beneficial for simple default.
> The test code, a variant of the test from the HashTable SqueakSource > wiki, is at the bottom of this message. Basically, it adds a bunch > of instances of Object to a Dictionary and sees how much time it > takes to look them up again. > > From the graphs in the previous message, you can see that > performance for sizes > 4000 is greatly improved. For size = 10000, > #at: is 1000 times faster, 2-3 microseconds vs. >2 milliseconds. At > size=10000, #at:put: is about 200 times faster, ~10 microseconds vs. > >2 milliseconds, and the large spikes for growing the collection are > 21 milliseconds vs. > 4 seconds, again a factor of about 200. > > Performance for dictionary sizes < 4000 is essentially the same as > before, so these collections can serve as general-purpose > collections over a wide range of sizes. I've attached the graphs for > sizes <4000 to this message so you can see that more clearly than on > the previous graphs. > > These results should hold for any object that inherits #hash from > Object, in other words uses its identity hash as its equality hash. > Other objects with better hashes did not have as serious a problem, > but will probably show increased performance as well due to using > prime table sizes. > > These changes are in Set, so should improve Set's subclasses as > well. IdentityDictionary, IdentitySet, and WeakIdentityKeyDictionary > did not have as serious a problem, but should see some improvement. > MethodDictionaries have been left alone on the assumption that the > VM depends on the hashing of those. So where are these changes :) Should we integrate them? Andres and other hash freaks what are your point of view? > Since there are still only 4K possible values of identity hash, > collisions are inevitable in large collections, and the number of > collisions will grow linearly with collection size. So how well does > the spread hash / prime table size do at even larger sizes? I ran > the same test at sizes of one million. As expected, access was quite > a bit slower than it had been at 10000. Time for #at: was ~250 > microseconds, and #at:put: was about the same. Note, however, that > this is still ten times faster than Dictionary previously was at a > size of 10000; 100 times larger yet 10 times faster. > > I had a lot of fun doing this. This is better results than I > expected, for a fairly minor (though deep) code change. So where are these changes :) Should we integrate them? Andres and other hash freaks what are your point of view? > | test ord | > Transcript cr. > test := Dictionary new. > [ test size >= 10000] whileFalse: [ > ord := OrderedCollection new: 100. > Transcript show: > [ > 100 timesRepeat: [ > test at: ( ord add: Object new ) put: nil ]. > ] timeToRun asString. > Transcript tab; > show: test size asString; > tab. > Transcript show: > [ > 1000 timesRepeat: [ > ord do: [ :each | test at: each ] ] > ] timeToRun asString. > Transcript tab; show: > [ > 1000 timesRepeat: [ > ord do: [ :each | ] ] > ] timeToRun asString. > Transcript cr ] > > < > SmallDictPerformanceGraphs1 > .png>_______________________________________________ > Pharo-project mailing list > [email protected] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project _______________________________________________ Pharo-project mailing list [email protected] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
