Re: [Pharo-project] One order of magnitude faster sets/dicts?!

Levente Uzonyi Tue, 14 Sep 2010 11:00:01 -0700

On Tue, 14 Sep 2010, Toon Verwaest wrote:

Hi,
I'm a bit skeptic, because 10x improvement is pretty hard to get if thebenchmark is not flawed and the code doesn't use VM support.
Okay, I found the benchmark code, and it's flawed. The distribution of thekeys is not uniform. Actually it's far from uniform, it's just 1..dictSize.Anyway I ran the benchmarks in Squeak 4.2 alpha and got the followingresults
...
The PDictionary is only better at 2 benchmarks:
Includes (not IncludesKey!) and RemoveKey which are both rarely used. Inall other tests Squeak's Dictionary implementation is faster in this flawedbenchmark.
I agree this benchmark is severely flawed. But it's generally also the idealcase for Pharo/Squeak dictionaries. Whatever might happen in other cases to

In one sense, it's ideal. #at:put is fast, because the first check slotwill be an empty slot. But #removeKey:ifAbsent: and #includes: will bevery slow because those will have to check all subsequent slots. Normallya hash table which uses open addressing with linear probing has shortchains. The average chain length is below a constant which only depends onthe load factor, this is how it provides O(1) time for the operations.

In Squeak the default hash function is very simple and therefore fast. Butit has a caveat. If you provide low quality input, like integers from asmall range (1..10000), it will have bad performance. WithPluggableSet/PluggableDictionary you can change the hash function if youneed it. There is a class method which creates a new dictionary that'smore tolerant to the small integer range and not uniform distributionissue: PluggableSet integerSet.

degenerate the dictionary doesn't happen here since keys are added in theright order (increasing hashes). So whenever place should be made for a key

So in this case the dictionary is far from normal, since it has only a fewchains which are very long.

with an already existing hash in Dictionary, this is avoided. In PDictionaryhowever this scenario wouldn't even happen. Now the problem is that I don'thave experience in writing proper benchmarks and this thread could provide uswith some decent benchmarks that better show standard use of dictionaries(and as I hope, show that PDictionaries are more resistant to degradationthan the current Dictionaries.

Here is the benchmark I used while I was changing Squeak's Dictionaryimplementation:http://leves.web.elte.hu/collections/DictionaryBenchmark.stI'm sure it could be improved and simplified a bit with the newcollection methods.

What we should take away from the existing benchmarks is that the differencesaren't even that high for the ideal case of Pharo/Squeak dictionaries.
The main point in favor of the PSet / PDictionary implementations is that youalways know the exact ranges of keys/values that belong to a certain hash. Sofor includesKey: and at:ifAbsent: this is ideal since you never look furtherthan the current hash-value. In Squeak/Pharo dictionary/set you don't knowthis, thanks to hash-collision/propagation, so you have to scan until youfind an association that is nil. If you are unlucky and your whole dictionaryhas all the associations grouped together, even though they all representkeys with different hashes, you'll have to look through a whole bunch oftotally unrelated keys.

It's the classical separate chaining vs open addressing problem. IMHO openaddressing is better in general: it uses less space, has less cache missesand has better performance with a proper hash function. Though there arespecial cases when it's inferior to separate chaining.

For example, I quickly changed one of the benchmarks to be a bit less optimalfor the Squeak/Pharo collections, namely the includesKey: benchmark (which isalso representative for at:)
Rather than doing

   1 to: dict size * 2 do: [ :i|
       dict at: (self key: i) ifAbsentPut: (self value: i)].

I changed it to:

   1 to: dict size * 10 by: 10 do: [ :i|
       dict at: (self key: i) ifAbsentPut: (self value: i)].
This forces the dictionary to miss a lot more often. Since all the keys aregrouped together this is the worst case for Squeak/Pharo dictionaries if youwant to look something up that's not there. This are the results on Squeak4.1:
PBDictionary  0.0261 +/-0.0017
PBSTDictionary  5.235 +/-0.01

So around 200x faster in case of PDictionary.

The values of the hash function are still in 1..100000 and thedistribution of the values is very far from uniform in that small range.

While PDictionary stays stable (even slightly faster, go figure), Squeakdictionaries degrade immensely! On Pharo the results are a bit better, butstill very much degraded:
PBDictionary 0.02387 +/-0.00014
PBSTDictionary 1.6577 +/-0.0036

So around 60x faster in case of PDictionary.
If on Set I ensure that all elements will be found, I get the followingresult for #includes:
PBSetIncludes 0.00440 +/-0.0001100000000000001
PBSTSetIncludes 0.004000 +/-9.6e-5
making PSet marginally slower. However, once you start not finding elements,with the same change as before, (to: size*10 by: 10):
PBSetIncludes 0.00443 +/-0.0001
PBSTSetIncludes 0.9844000000000005 +/-0.0017
So around 220x faster in case of PSet. While PSet and PDictionary neverreally suffer from collisions, normal Squeak/Pharo dictionaries sufferimmensely and grow up to 200x slower. Btw, this last benchmark was run onPharo.


See above.

Anyway, you are definitely right that the benchmarks suck and that we needbetter ones.
As for PDictionaries and PSets, nowing that you know exactly what the rangeof existing hashes is I think it's only logical that it will be faster inmany of the operations that need this knowledge, such as #includesKey:,#remove:, #at:put:/#add: and #at:/#includes:.


See above.

Since PDictionary switches between SmallDictionary style and normaldictionary style, it also seems logical that it behaves faster in case thatyou are in SmallDictionary style without having to care about it. This isalso not shown in any benchmark, but it is the case and makes it faster forthat case. However, this added test makes it slower for the generaldictionary style so removing it might improve things in that case to regainthe marginal difference between both dictionaries in the ideal case forPharo/Squeak.
There is anyway no reason why PDictionary would be slower than Pharo/Squeakdictionaries except for design decisions like the previously mentioned one,and those can be changed if wanted. There is however a good reason whySqueak/Pharo dictionaries are slower and degrade faster, namely collidinghashes that even propagate collisions (also not benchmarked at the moment!).

Set has better cache locality than PSet, I think Dictionary is similar toPDictionary in this regard.

Ok, on the whole I wasn't really planning on pushing these dictionaries toomuch on Pharo / Squeak. We use them in Pinocchio and are happy with them(especially since we have native support). But they are there and can be usedif you want to. I can understand that people are skeptical, but since I don'treally have knowledge on building benchmarks (I didn't even set up thecurrent ones, it was a student of mine) I don't think I can help much there.We use them throughout our project and some parts (such as our parser) havegotten a lot faster since we started using them.

I'm not saying that a hashed collection can't be faster than the currentimplementation, but a few people (including me) implemented serveralvariants over the years and the current idea was found the best so farfor general purpose.Performance can easily be boosted by primitives, especially in specialcases, like this: http://leves.web.elte.hu/LargeIdentityDictionary/ ,where I used the existing #pointsTo: primitive to boost #includesKey:(http://leves.web.elte.hu/LargeIdentityDictionary/LargeIdentityDictionary2.png). With a better primitive (for this purpose) all operations can beboosted drastically, but I prefer code written in Smalltalk. I think Cogwill be a lot faster in the near future, so the performance gap will besmaller.

Btw, the code would indeed just be released under MIT license which is maybealready automatically the case since I signed that agreement as a contributorto Pharo?

If you didn't upload your code to a Pharo repository, then it's notautomatically MIT just because you signed the agreement.

Some answers / comments to the other random questions and remarks:
I wonder what the #pPrimitive:plugin: pragmas stand for in PDictionary's
#at:ifAbsent: and #at:put:.
Those methods are implemented as primitives in the Pinocchio VM that we aredeveloping. In primitive form most of the methods run ~ twice as fast. Thisis however not compatible with the way the stack is handled in Squeak/Pharoso can't really be ported.
The current HashedCollections don't shrink. Remove is a rarely usedoperation.
Ok, makes sense.
One could implement the current dictionaries without associations, that'sjust less object-oriented. That would generate even less "garbage".
True enough.
Let's imagine that we want to put 10000 objects into a IdentitySet. Eachobject has it's identityHash in the 0..4095 range. The hash function isvery simple: hash \\ capacity + 1. Therefore the values of the hashfunction will all be in 1..4096, even if capacity is ~15000. This causes alot of collisions (the table will be consist of a few very long chainsdegrading performance to unacceptable leves). These collisions are avoidedby the shift.
So you change the hash implementation to ensure that they take up the wholearray of the Dictionary rather than just the first 4096 + collisionspositions. Makes perfect sense for the squeak/pharo implementation but ratherstrange to enforce this on the whole system imho.

Well, it's only that way in Pharo. In Squeak we did it differently:#identityHash returns the value of the primitive (0..4095) and#scaledIdentityHash does the shifting (0..SmallInteger maxVal). So "old"code that uses #identityHash doesn't break and doesn't have performanceissues.



Ok, I hope this clears up some things.


Yes, thanks.


Levente


cheers,
Toon

_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Re: [Pharo-project] One order of magnitude faster sets/dicts?!

Reply via email to