On Tue, 14 Sep 2010, Toon Verwaest wrote:
Hi,
I'm a bit skeptic, because 10x improvement is pretty hard to get if the
benchmark is not flawed and the code doesn't use VM support.
Okay, I found the benchmark code, and it's flawed. The distribution of the
keys is not uniform. Actually it's far from uniform, it's just 1..dictSize.
Anyway I ran the benchmarks in Squeak 4.2 alpha and got the following
results
...
The PDictionary is only better at 2 benchmarks:
Includes (not IncludesKey!) and RemoveKey which are both rarely used. In
all other tests Squeak's Dictionary implementation is faster in this flawed
benchmark.
I agree this benchmark is severely flawed. But it's generally also the ideal
case for Pharo/Squeak dictionaries. Whatever might happen in other cases to
In one sense, it's ideal. #at:put is fast, because the first check slot
will be an empty slot. But #removeKey:ifAbsent: and #includes: will be
very slow because those will have to check all subsequent slots. Normally
a hash table which uses open addressing with linear probing has short
chains. The average chain length is below a constant which only depends on
the load factor, this is how it provides O(1) time for the operations.
In Squeak the default hash function is very simple and therefore fast. But
it has a caveat. If you provide low quality input, like integers from a
small range (1..10000), it will have bad performance. With
PluggableSet/PluggableDictionary you can change the hash function if you
need it. There is a class method which creates a new dictionary that's
more tolerant to the small integer range and not uniform distribution
issue: PluggableSet integerSet.
degenerate the dictionary doesn't happen here since keys are added in the
right order (increasing hashes). So whenever place should be made for a key
So in this case the dictionary is far from normal, since it has only a few
chains which are very long.
with an already existing hash in Dictionary, this is avoided. In PDictionary
however this scenario wouldn't even happen. Now the problem is that I don't
have experience in writing proper benchmarks and this thread could provide us
with some decent benchmarks that better show standard use of dictionaries
(and as I hope, show that PDictionaries are more resistant to degradation
than the current Dictionaries.
Here is the benchmark I used while I was changing Squeak's Dictionary
implementation:
http://leves.web.elte.hu/collections/DictionaryBenchmark.st
I'm sure it could be improved and simplified a bit with the new
collection methods.
What we should take away from the existing benchmarks is that the differences
aren't even that high for the ideal case of Pharo/Squeak dictionaries.
The main point in favor of the PSet / PDictionary implementations is that you
always know the exact ranges of keys/values that belong to a certain hash. So
for includesKey: and at:ifAbsent: this is ideal since you never look further
than the current hash-value. In Squeak/Pharo dictionary/set you don't know
this, thanks to hash-collision/propagation, so you have to scan until you
find an association that is nil. If you are unlucky and your whole dictionary
has all the associations grouped together, even though they all represent
keys with different hashes, you'll have to look through a whole bunch of
totally unrelated keys.
It's the classical separate chaining vs open addressing problem. IMHO open
addressing is better in general: it uses less space, has less cache misses
and has better performance with a proper hash function. Though there are
special cases when it's inferior to separate chaining.
For example, I quickly changed one of the benchmarks to be a bit less optimal
for the Squeak/Pharo collections, namely the includesKey: benchmark (which is
also representative for at:)
Rather than doing
1 to: dict size * 2 do: [ :i|
dict at: (self key: i) ifAbsentPut: (self value: i)].
I changed it to:
1 to: dict size * 10 by: 10 do: [ :i|
dict at: (self key: i) ifAbsentPut: (self value: i)].
This forces the dictionary to miss a lot more often. Since all the keys are
grouped together this is the worst case for Squeak/Pharo dictionaries if you
want to look something up that's not there. This are the results on Squeak
4.1:
PBDictionary 0.0261 +/-0.0017
PBSTDictionary 5.235 +/-0.01
So around 200x faster in case of PDictionary.
The values of the hash function are still in 1..100000 and the
distribution of the values is very far from uniform in that small range.
While PDictionary stays stable (even slightly faster, go figure), Squeak
dictionaries degrade immensely! On Pharo the results are a bit better, but
still very much degraded:
PBDictionary 0.02387 +/-0.00014
PBSTDictionary 1.6577 +/-0.0036
So around 60x faster in case of PDictionary.
If on Set I ensure that all elements will be found, I get the following
result for #includes:
PBSetIncludes 0.00440 +/-0.0001100000000000001
PBSTSetIncludes 0.004000 +/-9.6e-5
making PSet marginally slower. However, once you start not finding elements,
with the same change as before, (to: size*10 by: 10):
PBSetIncludes 0.00443 +/-0.0001
PBSTSetIncludes 0.9844000000000005 +/-0.0017
So around 220x faster in case of PSet. While PSet and PDictionary never
really suffer from collisions, normal Squeak/Pharo dictionaries suffer
immensely and grow up to 200x slower. Btw, this last benchmark was run on
Pharo.
See above.
Anyway, you are definitely right that the benchmarks suck and that we need
better ones.
As for PDictionaries and PSets, nowing that you know exactly what the range
of existing hashes is I think it's only logical that it will be faster in
many of the operations that need this knowledge, such as #includesKey:,
#remove:, #at:put:/#add: and #at:/#includes:.
See above.
Since PDictionary switches between SmallDictionary style and normal
dictionary style, it also seems logical that it behaves faster in case that
you are in SmallDictionary style without having to care about it. This is
also not shown in any benchmark, but it is the case and makes it faster for
that case. However, this added test makes it slower for the general
dictionary style so removing it might improve things in that case to regain
the marginal difference between both dictionaries in the ideal case for
Pharo/Squeak.
There is anyway no reason why PDictionary would be slower than Pharo/Squeak
dictionaries except for design decisions like the previously mentioned one,
and those can be changed if wanted. There is however a good reason why
Squeak/Pharo dictionaries are slower and degrade faster, namely colliding
hashes that even propagate collisions (also not benchmarked at the moment!).
Set has better cache locality than PSet, I think Dictionary is similar to
PDictionary in this regard.
Ok, on the whole I wasn't really planning on pushing these dictionaries too
much on Pharo / Squeak. We use them in Pinocchio and are happy with them
(especially since we have native support). But they are there and can be used
if you want to. I can understand that people are skeptical, but since I don't
really have knowledge on building benchmarks (I didn't even set up the
current ones, it was a student of mine) I don't think I can help much there.
We use them throughout our project and some parts (such as our parser) have
gotten a lot faster since we started using them.
I'm not saying that a hashed collection can't be faster than the current
implementation, but a few people (including me) implemented serveral
variants over the years and the current idea was found the best so far
for general purpose.
Performance can easily be boosted by primitives, especially in special
cases, like this: http://leves.web.elte.hu/LargeIdentityDictionary/ ,
where I used the existing #pointsTo: primitive to boost #includesKey:
(http://leves.web.elte.hu/LargeIdentityDictionary/LargeIdentityDictionary2.png
). With a better primitive (for this purpose) all operations can be
boosted drastically, but I prefer code written in Smalltalk. I think Cog
will be a lot faster in the near future, so the performance gap will be
smaller.
Btw, the code would indeed just be released under MIT license which is maybe
already automatically the case since I signed that agreement as a contributor
to Pharo?
If you didn't upload your code to a Pharo repository, then it's not
automatically MIT just because you signed the agreement.
Some answers / comments to the other random questions and remarks:
I wonder what the #pPrimitive:plugin: pragmas stand for in PDictionary's
#at:ifAbsent: and #at:put:.
Those methods are implemented as primitives in the Pinocchio VM that we are
developing. In primitive form most of the methods run ~ twice as fast. This
is however not compatible with the way the stack is handled in Squeak/Pharo
so can't really be ported.
The current HashedCollections don't shrink. Remove is a rarely used
operation.
Ok, makes sense.
One could implement the current dictionaries without associations, that's
just less object-oriented. That would generate even less "garbage".
True enough.
Let's imagine that we want to put 10000 objects into a IdentitySet. Each
object has it's identityHash in the 0..4095 range. The hash function is
very simple: hash \\ capacity + 1. Therefore the values of the hash
function will all be in 1..4096, even if capacity is ~15000. This causes a
lot of collisions (the table will be consist of a few very long chains
degrading performance to unacceptable leves). These collisions are avoided
by the shift.
So you change the hash implementation to ensure that they take up the whole
array of the Dictionary rather than just the first 4096 + collisions
positions. Makes perfect sense for the squeak/pharo implementation but rather
strange to enforce this on the whole system imho.
Well, it's only that way in Pharo. In Squeak we did it differently:
#identityHash returns the value of the primitive (0..4095) and
#scaledIdentityHash does the shifting (0..SmallInteger maxVal). So "old"
code that uses #identityHash doesn't break and doesn't have performance
issues.
Ok, I hope this clears up some things.
Yes, thanks.
Levente
cheers,
Toon
_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project