Re: [Pharo-project] One order of magnitude faster sets/dicts?!

Levente Uzonyi Mon, 13 Sep 2010 08:37:07 -0700

On Mon, 13 Sep 2010, Levente Uzonyi wrote:

On Mon, 13 Sep 2010, Adrian Lienhard wrote:
This mail is quite long as I copied it from a conversation with ToonVerwaest. Toon and Camillo re-implemented hashed collections for Pinocchio(http://scg.unibe.ch/research/pinocchio). Their benchmarks have shown thatthe new implementation is significantly faster than the one in Pharo 1.1.The source code is available from www.squeaksource.com/p.html (they haven'tset a license, but I'm pretty sure they'll release their code as MIT).
I post this here to find somebody knowledgeable in the area to take a lookat the code and figure out whether this implementation would be interestingfor Pharo...
Cheers,
Adrian


citing Toon:
We ran some benchmarks on our data structures (It's in comparison with andbenchmarked in standard Pharo 1.1). We just add / remove integers within acertain range thus always perfectly distributed making it ideal for Pharodictionaries.
The numbers show how long a certain benchmark took on average in 20 runs ofthe benchmark and its standard deviation over those runs. A singlebenchmark run generally consists of many individual operations on thecollection in question. For example:
benchIncludes
  1 to: set size * 2 do: [ :i|
      set includes: i ]
where the set has size 10000. Evaluating this once results in a single run.So the number at the top of the benchmark is the sum of all averages withthe average stdev.
The results:

PDictionary: 15.8 +/-1.7
  Do                0.472 +/-0.079
  RemoveKey         0.010 +/-0.021
  AtPut             0.947 +/-0.02
  AtIfAbsentPut     1.64 +/-0.96
  AtPutExisting     0.198 +/-0.011
  KeysAndValuesDo   0.502 +/-0.09
  IncludesKey       0.365 +/-0.024
  AtPutNew          4.2 +/-1.3
  Includes          7.40 +/-0.33

Dictionary: 138.2 +/-4.7
  Do                0.830 +/-0.098
  RemoveKey         10.292 +/-0.077
  AtPut             0.957 +/-0.044
  AtIfAbsentPut     1.52 +/-0.95
  AtPutExisting     0.203 +/-0.011
  KeysAndValuesDo   0.850 +/-0.096
  IncludesKey       80.25 +/-0.25
  AtPutNew          3.5 +/-1.3
  Includes          39.8 +/-4.4

PSet: 1.767 +/-0.057
  Remove            0.008 +/-0.018
  Add               0.845 +/-0.039
  AddExisting       0.310 +/-0.021
  AddNew            0.240 +/-0.021
  Includes          0.365 +/-0.024

Set: 70.9 +/-1.2
  Remove            7.737 +/-0.051
  Add               0.755 +/-0.022
  AddExisting       0.305 +/-0.015
  AddNew            2.473 +/-0.03
  Includes          59.6 +/-1.2
Obviously the very slight differences have to be taken with a grain ofsalt, but whenever it's more than 10x faster it's quite unlikely that it'sdue to some random runtime variation that might totally change in anotherrun.
I'm a bit skeptic, because 10x improvement is pretty hard to get if thebenchmark is not flawed and the code doesn't use VM support.

Okay, I found the benchmark code, and it's flawed. The distribution of thekeys is not uniform. Actually it's far from uniform, it's just 1..dictSize.Anyway I ran the benchmarks in Squeak 4.2 alpha and got the followingresults:


PBDictionary: 0.1249 +/-0.0067
RemoveKey 6.7e-500 +/-4.6e-5
AtPutNew 0.0648 +/-0.0048
Do 0.00160 +/-0.0001
AtPutExisting 0.003300000000000002 +/-0.00018
AtIfAbsentPut 0.0192 +/-0.0035
AtPut 0.01557000000000001 +/-0.00028
IncludesKey 0.0180 +/-0.0029
KeysAndValuesDo 0.00223 +/-0.0001200000000000001
Includes 0.000133 +/-6.3e-5

PBSTDictionary: 0.2204 +/-0.0063
RemoveKey 0.07203 +/-0.00032
AtPutNew 0.0434 +/-0.0055
Do 0.00 +/-0.0
AtPutExisting 0.00260 +/-0.00013
AtIfAbsentPut 0.01193 +/-0.0002
AtPut 0.0147 +/-0.0031
IncludesKey 0.01157000000000001 +/-0.0002400000000000001
KeysAndValuesDo 0.002200 +/-7.4e-5
Includes 0.05990000000000003 +/-0.00019

The PDictionary is only better at 2 benchmarks:

Includes (not IncludesKey!) and RemoveKey which are both rarely used. Inall other tests Squeak's Dictionary implementation is faster in thisflawed benchmark.



Levente

I wonder what the #pPrimitive:plugin: pragmas stand for in PDictionary's
#at:ifAbsent: and #at:put:.

Where can I find the benchmark code?
An important thing for our datastructures is that they don't degrade sincewe don't have colliding hashes; but we didn't really test that yet(although it becomes obvious in some of the benchmark results such asremoving of elements and testing presence of elements).
Is there anything special that one would neet to consider when replacingthe old implementation in Pharo with yours?
What doesn't happen at the moment is shrinking dictionaries after they havegrown significantly. I wouldn't know at what point to do so either; nor doI think Dictionary does that?
The current HashedCollections don't shrink. Remove is a rarely usedoperation.
There is a parameter indicating how many elements can be added before itswitches from a SmallDictionary version to a dictionary with buckets. Thisis set to 20 by default, and I have no idea if it makes sense to changeit; I didn't really profile what the optimal number is. Maybe it makessense to make that a constant in the code rather than an instance variableto save a bit of space and time ...
We have our own PHashedCollection subclassing from Collection; so thebiggest part of the hierarchy is compatible with what Pharo has. Mostlywhile porting Pinocchio-specific stuff will have to be removed; mostlyannotations related to Pinocchio Primitives and exporting directives. Thisis all straightforward though.
The current "ratio" of hashes to elements is set to 500%. So basically whenyou have 32 buckets it will only grow when it contains 160 elements (orkey-value pairs), so 5 elements per bucket on average. This seems to notmake it slower which is understandable since SmallDictionary is pretty fastfor small amounts of elements; but this ratio might have to be tweakeddepending on the kind of objects and is currently a parameter. Theadvantage of using 500% is that you have A LOT less garbage.
Oh, and our dictionaries do use a lot less garbage, since Pharo'sdictionaries use association objects for each key-value pair. We don't, sowe never generate garbage when you remove an object.Our Sets on the other hand do use a bit more memory because of the bucketscheme. Sets don't use associations so we don't have an edge there :) Onlyperformance-wise in that case.
One could implement the current dictionaries without associations, that'sjust less object-oriented. That would generate even less "garbage".
The hashes are masked with the amount of buckets - 1, since the buckets arealways a power of 2.This implies that the lower bits of the hash are the most significant ones.This does not work together with the current identityHash /IdentityDictionary since the 12bit hash is shifted to the left by 18 (orso). This would make all the lower 18 bits 0 and all objects thus ending upin the first bucket.
That is a very important thing to note enough. However, I also think it's abit strange to just bitshift the identityHash by 18. You don't win anysignificant numbers with it ... you have 12bit hashes; no way to improvethat (without changing the image format). Since we overfill ourdictionaries however this works quite ok with those bad hashes; and we'llonly start wasting a bit of space after you have added 20480 elements tothe dictionary (you have 4096 different hashes with 12 bits, * 5).
Let's imagine that we want to put 10000 objects into a IdentitySet. Eachobject has it's identityHash in the 0..4095 range. The hash function is verysimple: hash \\ capacity + 1. Therefore the values of the hash function willall be in 1..4096, even if capacity is ~15000. This causes a lot ofcollisions (the table will be consist of a few very long chains degradingperformance to unacceptable leves). These collisions are avoided by theshift.
Levente
_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Re: [Pharo-project] One order of magnitude faster sets/dicts?!

Reply via email to