Hi,

> It's impossible to write a general purpose hash table that will be
> suitable for every use case, considering all the combinations of
> design trade offs and subsets of functionality someone might want.

Very much agreed.


> There is other work being done in this space:  I'm aware of Andres's
> current thread[1] on Robin Hood-style closed hashing tables with
> macro-ised size, hash and comparator operations à la C++ container
> templates, and I'm following along with interest.

> He vigorously
> rejects chaining hash tables as the natural enemies of memory
> hardware.

I'd not go quite that far. There are good arguments for open addressing,
and there's good arguments for open chaining. The latter is a lot more
convincing if you want concurrent access with partition based locking,
for example...


> I guess Andres's simplehash should already work in DSA memory nicely
> anyway since it doesn't have any internal pointers IIUC (?).

The data access doesn't have pointers, but the "metadata" does have a
pointer to the data. But that's a very solvable problem.  But
incremental resizing and granular locking aren't going to happen for it,
so it'd really only be suitable for presenting a read-only (or possibly
read-mostly) view.


> Potential use cases for DHT include caches, in-memory database objects
> and working state for parallel execution.

Is there a more concrete example, i.e. a user we'd convert to this at
the same time as introducing this hashtable?


> There are a couple of things I'm not happy about in this version of
> DHT: the space overhead per item is too high (hash + wasted padding +
> next pointer), and the iterator system is overly complicated.  I have
> another version in development which gets rid of the hash and padding
> and sometimes gets rid of the next pointer to achieve zero overhead,
> and I'm working on a simpler iteration algorithm that keeps the
> current guarantees but doesn't need to reorder bucket chains.  More on
> that, with some notes on performance and a testing module, soon.

FWIW, my experimentation showed that getting rid of the hash itself is
quite often *NOT* a worthwhile tradeof. Comparing keys and recomputing
hashes is often quite expensive (e.g. for GROUP BY it's where the
majority of time is spent after the simplehash patch).

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to