Re: [Pharo-project] One order of magnitude faster sets/dicts?!

Toon Verwaest Tue, 14 Sep 2010 02:02:01 -0700

Hi,

I'm a bit skeptic, because 10x improvement is pretty hard to get ifthe benchmark is not flawed and the code doesn't use VM support.
Okay, I found the benchmark code, and it's flawed. The distribution ofthe keys is not uniform. Actually it's far from uniform, it's just1..dictSize.Anyway I ran the benchmarks in Squeak 4.2 alpha and got the followingresults

...

The PDictionary is only better at 2 benchmarks:
Includes (not IncludesKey!) and RemoveKey which are both rarely used.In all other tests Squeak's Dictionary implementation is faster inthis flawed benchmark.

I agree this benchmark is severely flawed. But it's generally also theideal case for Pharo/Squeak dictionaries. Whatever might happen in othercases to degenerate the dictionary doesn't happen here since keys areadded in the right order (increasing hashes). So whenever place shouldbe made for a key with an already existing hash in Dictionary, this isavoided. In PDictionary however this scenario wouldn't even happen. Nowthe problem is that I don't have experience in writing proper benchmarksand this thread could provide us with some decent benchmarks that bettershow standard use of dictionaries (and as I hope, show thatPDictionaries are more resistant to degradation than the currentDictionaries.

What we should take away from the existing benchmarks is that thedifferences aren't even that high for the ideal case of Pharo/Squeakdictionaries.

The main point in favor of the PSet / PDictionary implementations isthat you always know the exact ranges of keys/values that belong to acertain hash. So for includesKey: and at:ifAbsent: this is ideal sinceyou never look further than the current hash-value. In Squeak/Pharodictionary/set you don't know this, thanks tohash-collision/propagation, so you have to scan until you find anassociation that is nil. If you are unlucky and your whole dictionaryhas all the associations grouped together, even though they allrepresent keys with different hashes, you'll have to look through awhole bunch of totally unrelated keys.

For example, I quickly changed one of the benchmarks to be a bit lessoptimal for the Squeak/Pharo collections, namely the includesKey:benchmark (which is also representative for at:)

Rather than doing

    1 to: dict size * 2 do: [ :i|
        dict at: (self key: i) ifAbsentPut: (self value: i)].

I changed it to:

    1 to: dict size * 10 by: 10 do: [ :i|
        dict at: (self key: i) ifAbsentPut: (self value: i)].

This forces the dictionary to miss a lot more often. Since all the keysare grouped together this is the worst case for Squeak/Pharodictionaries if you want to look something up that's not there. This arethe results on Squeak 4.1:


PBDictionary  0.0261 +/-0.0017
PBSTDictionary  5.235 +/-0.01

So around 200x faster in case of PDictionary.

While PDictionary stays stable (even slightly faster, go figure), Squeakdictionaries degrade immensely! On Pharo the results are a bit better,but still very much degraded:


PBDictionary 0.02387 +/-0.00014
PBSTDictionary 1.6577 +/-0.0036

So around 60x faster in case of PDictionary.

If on Set I ensure that all elements will be found, I get the followingresult for #includes:


PBSetIncludes 0.00440 +/-0.0001100000000000001
PBSTSetIncludes 0.004000 +/-9.6e-5

making PSet marginally slower. However, once you start not findingelements, with the same change as before, (to: size*10 by: 10):


PBSetIncludes 0.00443 +/-0.0001
PBSTSetIncludes 0.9844000000000005 +/-0.0017

So around 220x faster in case of PSet. While PSet and PDictionary neverreally suffer from collisions, normal Squeak/Pharo dictionaries sufferimmensely and grow up to 200x slower. Btw, this last benchmark was runon Pharo.

Anyway, you are definitely right that the benchmarks suck and that weneed better ones.

As for PDictionaries and PSets, nowing that you know exactly what therange of existing hashes is I think it's only logical that it will befaster in many of the operations that need this knowledge, such as#includesKey:, #remove:, #at:put:/#add: and #at:/#includes:.

Since PDictionary switches between SmallDictionary style and normaldictionary style, it also seems logical that it behaves faster in casethat you are in SmallDictionary style without having to care about it.This is also not shown in any benchmark, but it is the case and makes itfaster for that case. However, this added test makes it slower for thegeneral dictionary style so removing it might improve things in thatcase to regain the marginal difference between both dictionaries in theideal case for Pharo/Squeak.

There is anyway no reason why PDictionary would be slower thanPharo/Squeak dictionaries except for design decisions like thepreviously mentioned one, and those can be changed if wanted. There ishowever a good reason why Squeak/Pharo dictionaries are slower anddegrade faster, namely colliding hashes that even propagate collisions(also not benchmarked at the moment!).

Ok, on the whole I wasn't really planning on pushing these dictionariestoo much on Pharo / Squeak. We use them in Pinocchio and are happy withthem (especially since we have native support). But they are there andcan be used if you want to. I can understand that people are skeptical,but since I don't really have knowledge on building benchmarks (I didn'teven set up the current ones, it was a student of mine) I don't think Ican help much there. We use them throughout our project and some parts(such as our parser) have gotten a lot faster since we started using them.

Btw, the code would indeed just be released under MIT license which ismaybe already automatically the case since I signed that agreement as acontributor to Pharo?



Some answers / comments to the other random questions and remarks:

I wonder what the #pPrimitive:plugin: pragmas stand for in PDictionary's
#at:ifAbsent: and #at:put:.

Those methods are implemented as primitives in the Pinocchio VM that weare developing. In primitive form most of the methods run ~ twice asfast. This is however not compatible with the way the stack is handledin Squeak/Pharo so can't really be ported.

The current HashedCollections don't shrink. Remove is a rarely usedoperation.

Ok, makes sense.

One could implement the current dictionaries without associations,that's just less object-oriented. That would generate even less"garbage".

True enough.

Let's imagine that we want to put 10000 objects into a IdentitySet.Each object has it's identityHash in the 0..4095 range. The hashfunction is very simple: hash \\ capacity + 1. Therefore the values ofthe hash function will all be in 1..4096, even if capacity is ~15000.This causes a lot of collisions (the table will be consist of a fewvery long chains degrading performance to unacceptable leves). Thesecollisions are avoided by the shift.

So you change the hash implementation to ensure that they take up thewhole array of the Dictionary rather than just the first 4096 +collisions positions. Makes perfect sense for the squeak/pharoimplementation but rather strange to enforce this on the whole system imho.



Ok, I hope this clears up some things.

cheers,
Toon

_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Re: [Pharo-project] One order of magnitude faster sets/dicts?!

Reply via email to