In addition, open addressing with linear probing has superior cache line read behavior (no indirection / random traversal, and if the first probe misses the second one was likely cached by the first one).

On 6/8/18 1:37 , Levente Uzonyi wrote:
On Fri, 8 Jun 2018, Clément Bera wrote:

Hi Max,
Theoretically, for a small number of elements, usually somewhere
between 3 and 30 depending on implementations, a linear search is
faster than a hash search, especially in the Pharo dictionary hash search
implementation.

Efficient dictionary implementations are usually bucket-based. The
dictionary holds a certain number of buckets, and based on the key
hash, the bucket where the key value is present is determined. Small
buckets are linear (arrays or linked list). Large buckets are
typically balanced binary trees (red-black trees typically). Under a
certain number of elements there is a single bucket, which means a linear
search is performed, as for the SmallDictionary. When it grows the
dictionary search becomes a combination between a hash search and a
linear or tree search.

The bucket-based list you described is called separate chaining
collision resolution, and it won't have any better performance in Squeak
than the currently used open addressing.
However, it is preferrable, when you can't trust your hash function.
When you do "crazy" things, like using a tree as a bucket, you just
exchange the possible O(n) worst case time with O(log(n)).


Pharo dictionary search is first hash-based, then all the buckets are
represented next to each other in the same arrays and a linear search
is performed there, leading to many collisions and slower search
time (especially when the element is not found), sometimes the code
searches over multiple buckets because the dictionary is too full or
there are too many near-collisions. The memory consumption is
competitive with the advanced implementations though (precise
measurements would need to be made).

This is called open addressing, and if your hash function is good
enough, the lists are short, so those "many collisions and slower search
time" just don't happen. And no, separate chaining needs a lot more
memory than open addressing, unless you manage to get rid of
Associations, but that's a lot harder than it sounds like.


Method dictionaries are represented differently to optimize the
look-up logic.

MethodDictionary is optimized to let the VM do lookups more easily.
Memory consumption is worse when the dictionary is sparse, but gets
better as it gets filled up compared to the current Dictionary
implementation (assuming Pharo still uses the same representation as
Squeak does).


If you want to improve things and have one dictionary implementation
instead of two, implement or look for a bucket based dictionary and
put it in the base image instead of Dictionary. This is quite some work
since there are many APIs to port. You can look at the Pinnochio
implementation, it's quite good but they've not implemented large
buckets.

Well, as an experiment, I'm sure it'll be fun. But don't expect better
performance nor better memory usage.

Levente




On Fri, Jun 8, 2018 at 8:46 AM, Max Leske <[email protected]> wrote:
      Hi,

      I was messing around with SmallDictionary when I suddenly
realised that I can't find a single reason to use it over a normal
Dictionary. While its name and class comment imply that it is somehow
      an optimised Dictionary, I don't see any measurement where that
actually holds up. The following was run in a Pharo 7 image on a
recent VM (see below):

      | d |
      d := SmallDictionary new.
      d sizeInMemory. "24"
      [100000 timesRepeat: [
              1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun.
"0:00:00:05.226"

      [100000 timesRepeat: [
              d at: 48 ifAbsent: [] ] ] timeToRun. "0:00:00:00.041"



      | d |
      d := Dictionary new.
      d sizeInMemory. "16"
      [100000 timesRepeat: [
              1 to: 100 do: [ :i | d at:i put: i] ] ] timeToRun.
"0:00:00:00.385"
      [100000 timesRepeat: [
              d at: 48 ifAbsent: [] ] ] timeToRun.  "0:00:00:00.006"


      As you can see, SmallDictionary is 8 bytes larger per instance
and significantly faster while reading and writing (I know that this
isn't a good benchmark but it suffices to make my point).


      Is anyone aware of a reason for hanging on to SmallDictionary?
I'm also curious to know how SmallDictionary came to be. There must
have been some advantage over Dictionary at some point in the
      past.


      Cheers,
      Max





      Image version: Pharo 7.0
      Build information:
Pharo-7.0+alpha.build.961.sha.a69e72a97136bc3f93831584b6efa2b1703deb84
(32 Bit)

      VM version: CoInterpreter VMMaker.oscog- nice.2281 uuid:
4beeaee7-567e-1a4b-b0fb-bd95ce302516 Nov 27 2017
      StackToRegisterMappingCogit VMMaker.oscog-nice.2283 uuid:
2d20324d-a2ab-48d6-b0f6-9fc3d66899da Nov 27 2017
      VM: 201711262336
https://github.com/OpenSmalltalk/opensmalltalk-vm.git $ Date: Mon Nov
27 00:36:29 2017 +0100 $ Plugins: 201711262336
https://github.com/OpenSmalltalk/opensmalltalk-vm.git $

      OS: macOS 10.13.5
      Machine: MacBook Pro (13-inch, 2016, Four Thunderbolt 3 Ports)




--
Clément Béra
https://clementbera.github.io/https://clementbera.wordpress.com/






Reply via email to