Re: [Pharo-project] One order of magnitude faster sets/dicts?!

Stéphane Ducasse Tue, 14 Sep 2010 07:57:23 -0700

tx levente

Stef


>>> This mail is quite long as I copied it from a conversation with Toon 
>>> Verwaest. Toon and Camillo re-implemented hashed collections for Pinocchio 
>>> (http://scg.unibe.ch/research/pinocchio). Their benchmarks have shown that 
>>> the new implementation is significantly faster than the one in Pharo 1.1. 
>>> The source code is available from www.squeaksource.com/p.html (they haven't 
>>> set a license, but I'm pretty sure they'll release their code as MIT).
>>> I post this here to find somebody knowledgeable in the area to take a look 
>>> at the code and figure out whether this implementation would be interesting 
>>> for Pharo...
>>> Cheers,
>>> Adrian
>>> citing Toon:
>>> We ran some benchmarks on our data structures (It's in comparison with and 
>>> benchmarked in standard Pharo 1.1). We just add / remove integers within a 
>>> certain range thus always perfectly distributed making it ideal for Pharo 
>>> dictionaries.
>>> The numbers show how long a certain benchmark took on average in 20 runs of 
>>> the benchmark and its standard deviation over those runs. A single 
>>> benchmark run generally consists of many individual operations on the 
>>> collection in question. For example:
>>> benchIncludes
>>>  1 to: set size * 2 do: [ :i|
>>>      set includes: i ]
>>> where the set has size 10000. Evaluating this once results in a single run. 
>>> So the number at the top of the benchmark is the sum of all averages with 
>>> the average stdev.
>>> The results:
>>> PDictionary: 15.8 +/-1.7
>>>  Do                0.472 +/-0.079
>>>  RemoveKey         0.010 +/-0.021
>>>  AtPut             0.947 +/-0.02
>>>  AtIfAbsentPut     1.64 +/-0.96
>>>  AtPutExisting     0.198 +/-0.011
>>>  KeysAndValuesDo   0.502 +/-0.09
>>>  IncludesKey       0.365 +/-0.024
>>>  AtPutNew          4.2 +/-1.3
>>>  Includes          7.40 +/-0.33
>>> Dictionary: 138.2 +/-4.7
>>>  Do                0.830 +/-0.098
>>>  RemoveKey         10.292 +/-0.077
>>>  AtPut             0.957 +/-0.044
>>>  AtIfAbsentPut     1.52 +/-0.95
>>>  AtPutExisting     0.203 +/-0.011
>>>  KeysAndValuesDo   0.850 +/-0.096
>>>  IncludesKey       80.25 +/-0.25
>>>  AtPutNew          3.5 +/-1.3
>>>  Includes          39.8 +/-4.4
>>> PSet: 1.767 +/-0.057
>>>  Remove            0.008 +/-0.018
>>>  Add               0.845 +/-0.039
>>>  AddExisting       0.310 +/-0.021
>>>  AddNew            0.240 +/-0.021
>>>  Includes          0.365 +/-0.024
>>> Set: 70.9 +/-1.2
>>>  Remove            7.737 +/-0.051
>>>  Add               0.755 +/-0.022
>>>  AddExisting       0.305 +/-0.015
>>>  AddNew            2.473 +/-0.03
>>>  Includes          59.6 +/-1.2
>>> Obviously the very slight differences have to be taken with a grain of 
>>> salt, but whenever it's more than 10x faster it's quite unlikely that it's 
>>> due to some random runtime variation that might totally change in another 
>>> run.
>> 
>> I'm a bit skeptic, because 10x improvement is pretty hard to get if the 
>> benchmark is not flawed and the code doesn't use VM support.
> 
> Okay, I found the benchmark code, and it's flawed. The distribution of the 
> keys is not uniform. Actually it's far from uniform, it's just 1..dictSize.
> Anyway I ran the benchmarks in Squeak 4.2 alpha and got the following results:
> 
> PBDictionary: 0.1249 +/-0.0067
> RemoveKey 6.7e-500 +/-4.6e-5
> AtPutNew 0.0648 +/-0.0048
> Do 0.00160 +/-0.0001
> AtPutExisting 0.003300000000000002 +/-0.00018
> AtIfAbsentPut 0.0192 +/-0.0035
> AtPut 0.01557000000000001 +/-0.00028
> IncludesKey 0.0180 +/-0.0029
> KeysAndValuesDo 0.00223 +/-0.0001200000000000001
> Includes 0.000133 +/-6.3e-5
> 
> PBSTDictionary: 0.2204 +/-0.0063
> RemoveKey 0.07203 +/-0.00032
> AtPutNew 0.0434 +/-0.0055
> Do 0.00 +/-0.0
> AtPutExisting 0.00260 +/-0.00013
> AtIfAbsentPut 0.01193 +/-0.0002
> AtPut 0.0147 +/-0.0031
> IncludesKey 0.01157000000000001 +/-0.0002400000000000001
> KeysAndValuesDo 0.002200 +/-7.4e-5
> Includes 0.05990000000000003 +/-0.00019
> 
> The PDictionary is only better at 2 benchmarks:
> Includes (not IncludesKey!) and RemoveKey which are both rarely used. In all 
> other tests Squeak's Dictionary implementation is faster in this flawed 
> benchmark.
> 
> 
> Levente
> 
>> 
>> I wonder what the #pPrimitive:plugin: pragmas stand for in PDictionary's
>> #at:ifAbsent: and #at:put:.
>> 
>> Where can I find the benchmark code?
>> 
>>> An important thing for our datastructures is that they don't degrade since 
>>> we don't have colliding hashes; but we didn't really test that yet 
>>> (although it becomes obvious in some of the benchmark results such as 
>>> removing of elements and testing presence of elements).
>>>> Is there anything special that one would neet to consider when replacing 
>>>> the old implementation in Pharo with yours?
>>> What doesn't happen at the moment is shrinking dictionaries after they have 
>>> grown significantly. I wouldn't know at what point to do so either; nor do 
>>> I think Dictionary does that?
>> 
>> The current HashedCollections don't shrink. Remove is a rarely used 
>> operation.
>> 
>>> There is a parameter indicating how many elements can be added before it 
>>> switches from a SmallDictionary version to a dictionary with buckets. This 
>>> is  set to 20 by default, and I have no idea if it makes sense to change 
>>> it; I didn't really profile what the optimal number is. Maybe it makes 
>>> sense to make that a constant in the code rather than an instance variable 
>>> to save a bit of space and time ...
>>> We have our own PHashedCollection subclassing from Collection; so the 
>>> biggest part of the hierarchy is compatible with what Pharo has. Mostly 
>>> while porting Pinocchio-specific stuff will have to be removed; mostly 
>>> annotations related to Pinocchio Primitives and exporting directives. This 
>>> is all straightforward though.
>>> The current "ratio" of hashes to elements is set to 500%. So basically when 
>>> you have 32 buckets it will only grow when it contains 160 elements (or 
>>> key-value pairs), so 5 elements per bucket on average. This seems to not 
>>> make it slower which is understandable since SmallDictionary is pretty fast 
>>> for small amounts of elements; but this ratio might have to be tweaked 
>>> depending on the kind of objects and is currently a parameter. The 
>>> advantage of using 500% is that you have A LOT less garbage.
>>> Oh, and our dictionaries do use a lot less garbage, since Pharo's 
>>> dictionaries use association objects for each key-value pair. We don't, so 
>>> we never generate garbage when you remove an object.
>>> Our Sets on the other hand do use a bit more memory because of the bucket 
>>> scheme. Sets don't use associations so we don't have an edge there :) Only 
>>> performance-wise in that case.
>> 
>> One could implement the current dictionaries without associations, that's 
>> just less object-oriented. That would generate even less "garbage".
>> 
>>> The hashes are masked with the amount of buckets - 1, since the buckets are 
>>> always a power of 2.
>>> This implies that the lower bits of the hash are the most significant ones. 
>>> This does not work together with the current identityHash / 
>>> IdentityDictionary since the 12bit hash is shifted to the left by 18 (or 
>>> so). This would make all the lower 18 bits 0 and all objects thus ending up 
>>> in the first bucket.
>>> That is a very important thing to note enough. However, I also think it's a 
>>> bit strange to just bitshift the identityHash by 18. You don't win any 
>>> significant numbers with it ... you have 12bit hashes; no way to improve 
>>> that (without changing the image format). Since we overfill our 
>>> dictionaries however this works quite ok with those bad hashes; and we'll 
>>> only start wasting a bit of space after you have added 20480 elements to 
>>> the dictionary (you have 4096 different hashes with 12 bits, * 5).
>> 
>> Let's imagine that we want to put 10000 objects into a IdentitySet. Each 
>> object has it's identityHash in the 0..4095 range. The hash function is very 
>> simple: hash \\ capacity + 1. Therefore the values of the hash function will 
>> all be in 1..4096, even if capacity is ~15000. This causes a lot of 
>> collisions (the table will be consist of a few very long chains degrading 
>> performance to unacceptable leves). These collisions are avoided by the 
>> shift.
>> 
>> 
>> Levente
>> 
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [email protected]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>> 
>> _______________________________________________
>> Pharo-project mailing list
>> [email protected]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>> 
> 
> _______________________________________________
> Pharo-project mailing list
> [email protected]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Re: [Pharo-project] One order of magnitude faster sets/dicts?!

Reply via email to