No, they just compared to Pharo 1.1. But the benchmarks are available in a separate package.
Adrian On Sep 13, 2010, at 15:52 , Stéphane Ducasse wrote: > do you know if they compare with squeak because levent did a lot of work in > that area. > > Stef > > On Sep 13, 2010, at 2:55 PM, Adrian Lienhard wrote: > >> This mail is quite long as I copied it from a conversation with Toon >> Verwaest. Toon and Camillo re-implemented hashed collections for Pinocchio >> (http://scg.unibe.ch/research/pinocchio). Their benchmarks have shown that >> the new implementation is significantly faster than the one in Pharo 1.1. >> The source code is available from www.squeaksource.com/p.html (they haven't >> set a license, but I'm pretty sure they'll release their code as MIT). >> >> I post this here to find somebody knowledgeable in the area to take a look >> at the code and figure out whether this implementation would be interesting >> for Pharo... >> >> Cheers, >> Adrian >> >> >> citing Toon: >> >> We ran some benchmarks on our data structures (It's in comparison with and >> benchmarked in standard Pharo 1.1). We just add / remove integers within a >> certain range thus always perfectly distributed making it ideal for Pharo >> dictionaries. >> >> The numbers show how long a certain benchmark took on average in 20 runs of >> the benchmark and its standard deviation over those runs. A single benchmark >> run generally consists of many individual operations on the collection in >> question. For example: >> >> benchIncludes >> 1 to: set size * 2 do: [ :i| >> set includes: i ] >> >> where the set has size 10000. Evaluating this once results in a single run. >> So the number at the top of the benchmark is the sum of all averages with >> the average stdev. >> >> The results: >> >> PDictionary: 15.8 +/-1.7 >> Do 0.472 +/-0.079 >> RemoveKey 0.010 +/-0.021 >> AtPut 0.947 +/-0.02 >> AtIfAbsentPut 1.64 +/-0.96 >> AtPutExisting 0.198 +/-0.011 >> KeysAndValuesDo 0.502 +/-0.09 >> IncludesKey 0.365 +/-0.024 >> AtPutNew 4.2 +/-1.3 >> Includes 7.40 +/-0.33 >> >> Dictionary: 138.2 +/-4.7 >> Do 0.830 +/-0.098 >> RemoveKey 10.292 +/-0.077 >> AtPut 0.957 +/-0.044 >> AtIfAbsentPut 1.52 +/-0.95 >> AtPutExisting 0.203 +/-0.011 >> KeysAndValuesDo 0.850 +/-0.096 >> IncludesKey 80.25 +/-0.25 >> AtPutNew 3.5 +/-1.3 >> Includes 39.8 +/-4.4 >> >> PSet: 1.767 +/-0.057 >> Remove 0.008 +/-0.018 >> Add 0.845 +/-0.039 >> AddExisting 0.310 +/-0.021 >> AddNew 0.240 +/-0.021 >> Includes 0.365 +/-0.024 >> >> Set: 70.9 +/-1.2 >> Remove 7.737 +/-0.051 >> Add 0.755 +/-0.022 >> AddExisting 0.305 +/-0.015 >> AddNew 2.473 +/-0.03 >> Includes 59.6 +/-1.2 >> >> >> Obviously the very slight differences have to be taken with a grain of salt, >> but whenever it's more than 10x faster it's quite unlikely that it's due to >> some random runtime variation that might totally change in another run. >> >> An important thing for our datastructures is that they don't degrade since >> we don't have colliding hashes; but we didn't really test that yet (although >> it becomes obvious in some of the benchmark results such as removing of >> elements and testing presence of elements). >> >>> Is there anything special that one would neet to consider when replacing >>> the old implementation in Pharo with yours? >> >> What doesn't happen at the moment is shrinking dictionaries after they have >> grown significantly. I wouldn't know at what point to do so either; nor do I >> think Dictionary does that? >> >> There is a parameter indicating how many elements can be added before it >> switches from a SmallDictionary version to a dictionary with buckets. This >> is set to 20 by default, and I have no idea if it makes sense to change it; >> I didn't really profile what the optimal number is. Maybe it makes sense to >> make that a constant in the code rather than an instance variable to save a >> bit of space and time ... >> >> We have our own PHashedCollection subclassing from Collection; so the >> biggest part of the hierarchy is compatible with what Pharo has. Mostly >> while porting Pinocchio-specific stuff will have to be removed; mostly >> annotations related to Pinocchio Primitives and exporting directives. This >> is all straightforward though. >> >> The current "ratio" of hashes to elements is set to 500%. So basically when >> you have 32 buckets it will only grow when it contains 160 elements (or >> key-value pairs), so 5 elements per bucket on average. This seems to not >> make it slower which is understandable since SmallDictionary is pretty fast >> for small amounts of elements; but this ratio might have to be tweaked >> depending on the kind of objects and is currently a parameter. The advantage >> of using 500% is that you have A LOT less garbage. >> >> Oh, and our dictionaries do use a lot less garbage, since Pharo's >> dictionaries use association objects for each key-value pair. We don't, so >> we never generate garbage when you remove an object. >> Our Sets on the other hand do use a bit more memory because of the bucket >> scheme. Sets don't use associations so we don't have an edge there :) Only >> performance-wise in that case. >> >> The hashes are masked with the amount of buckets - 1, since the buckets are >> always a power of 2. >> This implies that the lower bits of the hash are the most significant ones. >> This does not work together with the current identityHash / >> IdentityDictionary since the 12bit hash is shifted to the left by 18 (or >> so). This would make all the lower 18 bits 0 and all objects thus ending up >> in the first bucket. >> >> That is a very important thing to note enough. However, I also think it's a >> bit strange to just bitshift the identityHash by 18. You don't win any >> significant numbers with it ... you have 12bit hashes; no way to improve >> that (without changing the image format). Since we overfill our dictionaries >> however this works quite ok with those bad hashes; and we'll only start >> wasting a bit of space after you have added 20480 elements to the dictionary >> (you have 4096 different hashes with 12 bits, * 5). >> >> >> >> _______________________________________________ >> Pharo-project mailing list >> [email protected] >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > > > _______________________________________________ > Pharo-project mailing list > [email protected] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project _______________________________________________ Pharo-project mailing list [email protected] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
