On Mon, 13 Sep 2010, Adrian Lienhard wrote:

This mail is quite long as I copied it from a conversation with Toon Verwaest. 
Toon and Camillo re-implemented hashed collections for Pinocchio 
(http://scg.unibe.ch/research/pinocchio). Their benchmarks have shown that the 
new implementation is significantly faster than the one in Pharo 1.1. The 
source code is available from www.squeaksource.com/p.html (they haven't set a 
license, but I'm pretty sure they'll release their code as MIT).

I post this here to find somebody knowledgeable in the area to take a look at 
the code and figure out whether this implementation would be interesting for 
Pharo...

Cheers,
Adrian


citing Toon:

We ran some benchmarks on our data structures (It's in comparison with and 
benchmarked in standard Pharo 1.1). We just add / remove integers within a 
certain range thus always perfectly distributed making it ideal for Pharo 
dictionaries.

The numbers show how long a certain benchmark took on average in 20 runs of the 
benchmark and its standard deviation over those runs. A single benchmark run 
generally consists of many individual operations on the collection in question. 
For example:

benchIncludes
  1 to: set size * 2 do: [ :i|
      set includes: i ]

where the set has size 10000. Evaluating this once results in a single run. So 
the number at the top of the benchmark is the sum of all averages with the 
average stdev.

The results:

PDictionary: 15.8 +/-1.7
  Do                0.472 +/-0.079
  RemoveKey         0.010 +/-0.021
  AtPut             0.947 +/-0.02
  AtIfAbsentPut     1.64 +/-0.96
  AtPutExisting     0.198 +/-0.011
  KeysAndValuesDo   0.502 +/-0.09
  IncludesKey       0.365 +/-0.024
  AtPutNew          4.2 +/-1.3
  Includes          7.40 +/-0.33

Dictionary: 138.2 +/-4.7
  Do                0.830 +/-0.098
  RemoveKey         10.292 +/-0.077
  AtPut             0.957 +/-0.044
  AtIfAbsentPut     1.52 +/-0.95
  AtPutExisting     0.203 +/-0.011
  KeysAndValuesDo   0.850 +/-0.096
  IncludesKey       80.25 +/-0.25
  AtPutNew          3.5 +/-1.3
  Includes          39.8 +/-4.4

PSet: 1.767 +/-0.057
  Remove            0.008 +/-0.018
  Add               0.845 +/-0.039
  AddExisting       0.310 +/-0.021
  AddNew            0.240 +/-0.021
  Includes          0.365 +/-0.024

Set: 70.9 +/-1.2
  Remove            7.737 +/-0.051
  Add               0.755 +/-0.022
  AddExisting       0.305 +/-0.015
  AddNew            2.473 +/-0.03
  Includes          59.6 +/-1.2


Obviously the very slight differences have to be taken with a grain of salt, 
but whenever it's more than 10x faster it's quite unlikely that it's due to 
some random runtime variation that might totally change in another run.

I'm a bit skeptic, because 10x improvement is pretty hard to get if the benchmark is not flawed and the code doesn't use VM support.

I wonder what the #pPrimitive:plugin: pragmas stand for in PDictionary's
#at:ifAbsent: and #at:put:.

Where can I find the benchmark code?


An important thing for our datastructures is that they don't degrade since we 
don't have colliding hashes; but we didn't really test that yet (although it 
becomes obvious in some of the benchmark results such as removing of elements 
and testing presence of elements).

Is there anything special that one would neet to consider when replacing the 
old implementation in Pharo with yours?

What doesn't happen at the moment is shrinking dictionaries after they have 
grown significantly. I wouldn't know at what point to do so either; nor do I 
think Dictionary does that?

The current HashedCollections don't shrink. Remove is a rarely used operation.


There is a parameter indicating how many elements can be added before it 
switches from a SmallDictionary version to a dictionary with buckets. This is  
set to 20 by default, and I have no idea if it makes sense to change it; I 
didn't really profile what the optimal number is. Maybe it makes sense to make 
that a constant in the code rather than an instance variable to save a bit of 
space and time ...

We have our own PHashedCollection subclassing from Collection; so the biggest 
part of the hierarchy is compatible with what Pharo has. Mostly while porting 
Pinocchio-specific stuff will have to be removed; mostly annotations related to 
Pinocchio Primitives and exporting directives. This is all straightforward 
though.

The current "ratio" of hashes to elements is set to 500%. So basically when you 
have 32 buckets it will only grow when it contains 160 elements (or key-value pairs), so 
5 elements per bucket on average. This seems to not make it slower which is 
understandable since SmallDictionary is pretty fast for small amounts of elements; but 
this ratio might have to be tweaked depending on the kind of objects and is currently a 
parameter. The advantage of using 500% is that you have A LOT less garbage.

Oh, and our dictionaries do use a lot less garbage, since Pharo's dictionaries 
use association objects for each key-value pair. We don't, so we never generate 
garbage when you remove an object.
Our Sets on the other hand do use a bit more memory because of the bucket 
scheme. Sets don't use associations so we don't have an edge there :) Only 
performance-wise in that case.

One could implement the current dictionaries without associations, that's just less object-oriented. That would generate even less "garbage".


The hashes are masked with the amount of buckets - 1, since the buckets are 
always a power of 2.
This implies that the lower bits of the hash are the most significant ones. 
This does not work together with the current identityHash / IdentityDictionary 
since the 12bit hash is shifted to the left by 18 (or so). This would make all 
the lower 18 bits 0 and all objects thus ending up in the first bucket.

That is a very important thing to note enough. However, I also think it's a bit 
strange to just bitshift the identityHash by 18. You don't win any significant 
numbers with it ... you have 12bit hashes; no way to improve that (without 
changing the image format). Since we overfill our dictionaries however this 
works quite ok with those bad hashes; and we'll only start wasting a bit of 
space after you have added 20480 elements to the dictionary (you have 4096 
different hashes with 12 bits, * 5).

Let's imagine that we want to put 10000 objects into a IdentitySet. Each object has it's identityHash in the 0..4095 range. The hash function is very simple: hash \\ capacity + 1. Therefore the values of the hash function will all be in 1..4096, even if capacity is ~15000. This causes a lot of collisions (the table will be consist of a few very long chains degrading performance to unacceptable leves). These collisions are avoided by the shift.


Levente




_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Reply via email to