On Mon, 13 Sep 2010, Levente Uzonyi wrote:
On Mon, 13 Sep 2010, Adrian Lienhard wrote:
This mail is quite long as I copied it from a conversation with Toon
Verwaest. Toon and Camillo re-implemented hashed collections for Pinocchio
(http://scg.unibe.ch/research/pinocchio). Their benchmarks have shown that
the new implementation is significantly faster than the one in Pharo 1.1.
The source code is available from www.squeaksource.com/p.html (they haven't
set a license, but I'm pretty sure they'll release their code as MIT).
I post this here to find somebody knowledgeable in the area to take a look
at the code and figure out whether this implementation would be interesting
for Pharo...
Cheers,
Adrian
citing Toon:
We ran some benchmarks on our data structures (It's in comparison with and
benchmarked in standard Pharo 1.1). We just add / remove integers within a
certain range thus always perfectly distributed making it ideal for Pharo
dictionaries.
The numbers show how long a certain benchmark took on average in 20 runs of
the benchmark and its standard deviation over those runs. A single
benchmark run generally consists of many individual operations on the
collection in question. For example:
benchIncludes
1 to: set size * 2 do: [ :i|
set includes: i ]
where the set has size 10000. Evaluating this once results in a single run.
So the number at the top of the benchmark is the sum of all averages with
the average stdev.
The results:
PDictionary: 15.8 +/-1.7
Do 0.472 +/-0.079
RemoveKey 0.010 +/-0.021
AtPut 0.947 +/-0.02
AtIfAbsentPut 1.64 +/-0.96
AtPutExisting 0.198 +/-0.011
KeysAndValuesDo 0.502 +/-0.09
IncludesKey 0.365 +/-0.024
AtPutNew 4.2 +/-1.3
Includes 7.40 +/-0.33
Dictionary: 138.2 +/-4.7
Do 0.830 +/-0.098
RemoveKey 10.292 +/-0.077
AtPut 0.957 +/-0.044
AtIfAbsentPut 1.52 +/-0.95
AtPutExisting 0.203 +/-0.011
KeysAndValuesDo 0.850 +/-0.096
IncludesKey 80.25 +/-0.25
AtPutNew 3.5 +/-1.3
Includes 39.8 +/-4.4
PSet: 1.767 +/-0.057
Remove 0.008 +/-0.018
Add 0.845 +/-0.039
AddExisting 0.310 +/-0.021
AddNew 0.240 +/-0.021
Includes 0.365 +/-0.024
Set: 70.9 +/-1.2
Remove 7.737 +/-0.051
Add 0.755 +/-0.022
AddExisting 0.305 +/-0.015
AddNew 2.473 +/-0.03
Includes 59.6 +/-1.2
Obviously the very slight differences have to be taken with a grain of
salt, but whenever it's more than 10x faster it's quite unlikely that it's
due to some random runtime variation that might totally change in another
run.
I'm a bit skeptic, because 10x improvement is pretty hard to get if the
benchmark is not flawed and the code doesn't use VM support.
Okay, I found the benchmark code, and it's flawed. The distribution of the
keys is not uniform. Actually it's far from uniform, it's just 1..dictSize.
Anyway I ran the benchmarks in Squeak 4.2 alpha and got the following
results:
PBDictionary: 0.1249 +/-0.0067
RemoveKey 6.7e-500 +/-4.6e-5
AtPutNew 0.0648 +/-0.0048
Do 0.00160 +/-0.0001
AtPutExisting 0.003300000000000002 +/-0.00018
AtIfAbsentPut 0.0192 +/-0.0035
AtPut 0.01557000000000001 +/-0.00028
IncludesKey 0.0180 +/-0.0029
KeysAndValuesDo 0.00223 +/-0.0001200000000000001
Includes 0.000133 +/-6.3e-5
PBSTDictionary: 0.2204 +/-0.0063
RemoveKey 0.07203 +/-0.00032
AtPutNew 0.0434 +/-0.0055
Do 0.00 +/-0.0
AtPutExisting 0.00260 +/-0.00013
AtIfAbsentPut 0.01193 +/-0.0002
AtPut 0.0147 +/-0.0031
IncludesKey 0.01157000000000001 +/-0.0002400000000000001
KeysAndValuesDo 0.002200 +/-7.4e-5
Includes 0.05990000000000003 +/-0.00019
The PDictionary is only better at 2 benchmarks:
Includes (not IncludesKey!) and RemoveKey which are both rarely used. In
all other tests Squeak's Dictionary implementation is faster in this
flawed benchmark.
Levente
I wonder what the #pPrimitive:plugin: pragmas stand for in PDictionary's
#at:ifAbsent: and #at:put:.
Where can I find the benchmark code?
An important thing for our datastructures is that they don't degrade since
we don't have colliding hashes; but we didn't really test that yet
(although it becomes obvious in some of the benchmark results such as
removing of elements and testing presence of elements).
Is there anything special that one would neet to consider when replacing
the old implementation in Pharo with yours?
What doesn't happen at the moment is shrinking dictionaries after they have
grown significantly. I wouldn't know at what point to do so either; nor do
I think Dictionary does that?
The current HashedCollections don't shrink. Remove is a rarely used
operation.
There is a parameter indicating how many elements can be added before it
switches from a SmallDictionary version to a dictionary with buckets. This
is set to 20 by default, and I have no idea if it makes sense to change
it; I didn't really profile what the optimal number is. Maybe it makes
sense to make that a constant in the code rather than an instance variable
to save a bit of space and time ...
We have our own PHashedCollection subclassing from Collection; so the
biggest part of the hierarchy is compatible with what Pharo has. Mostly
while porting Pinocchio-specific stuff will have to be removed; mostly
annotations related to Pinocchio Primitives and exporting directives. This
is all straightforward though.
The current "ratio" of hashes to elements is set to 500%. So basically when
you have 32 buckets it will only grow when it contains 160 elements (or
key-value pairs), so 5 elements per bucket on average. This seems to not
make it slower which is understandable since SmallDictionary is pretty fast
for small amounts of elements; but this ratio might have to be tweaked
depending on the kind of objects and is currently a parameter. The
advantage of using 500% is that you have A LOT less garbage.
Oh, and our dictionaries do use a lot less garbage, since Pharo's
dictionaries use association objects for each key-value pair. We don't, so
we never generate garbage when you remove an object.
Our Sets on the other hand do use a bit more memory because of the bucket
scheme. Sets don't use associations so we don't have an edge there :) Only
performance-wise in that case.
One could implement the current dictionaries without associations, that's
just less object-oriented. That would generate even less "garbage".
The hashes are masked with the amount of buckets - 1, since the buckets are
always a power of 2.
This implies that the lower bits of the hash are the most significant ones.
This does not work together with the current identityHash /
IdentityDictionary since the 12bit hash is shifted to the left by 18 (or
so). This would make all the lower 18 bits 0 and all objects thus ending up
in the first bucket.
That is a very important thing to note enough. However, I also think it's a
bit strange to just bitshift the identityHash by 18. You don't win any
significant numbers with it ... you have 12bit hashes; no way to improve
that (without changing the image format). Since we overfill our
dictionaries however this works quite ok with those bad hashes; and we'll
only start wasting a bit of space after you have added 20480 elements to
the dictionary (you have 4096 different hashes with 12 bits, * 5).
Let's imagine that we want to put 10000 objects into a IdentitySet. Each
object has it's identityHash in the 0..4095 range. The hash function is very
simple: hash \\ capacity + 1. Therefore the values of the hash function will
all be in 1..4096, even if capacity is ~15000. This causes a lot of
collisions (the table will be consist of a few very long chains degrading
performance to unacceptable leves). These collisions are avoided by the
shift.
Levente
_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project