Hi, > It's impossible to write a general purpose hash table that will be > suitable for every use case, considering all the combinations of > design trade offs and subsets of functionality someone might want.
Very much agreed. > There is other work being done in this space: I'm aware of Andres's > current thread[1] on Robin Hood-style closed hashing tables with > macro-ised size, hash and comparator operations à la C++ container > templates, and I'm following along with interest. > He vigorously > rejects chaining hash tables as the natural enemies of memory > hardware. I'd not go quite that far. There are good arguments for open addressing, and there's good arguments for open chaining. The latter is a lot more convincing if you want concurrent access with partition based locking, for example... > I guess Andres's simplehash should already work in DSA memory nicely > anyway since it doesn't have any internal pointers IIUC (?). The data access doesn't have pointers, but the "metadata" does have a pointer to the data. But that's a very solvable problem. But incremental resizing and granular locking aren't going to happen for it, so it'd really only be suitable for presenting a read-only (or possibly read-mostly) view. > Potential use cases for DHT include caches, in-memory database objects > and working state for parallel execution. Is there a more concrete example, i.e. a user we'd convert to this at the same time as introducing this hashtable? > There are a couple of things I'm not happy about in this version of > DHT: the space overhead per item is too high (hash + wasted padding + > next pointer), and the iterator system is overly complicated. I have > another version in development which gets rid of the hash and padding > and sometimes gets rid of the next pointer to achieve zero overhead, > and I'm working on a simpler iteration algorithm that keeps the > current guarantees but doesn't need to reorder bucket chains. More on > that, with some notes on performance and a testing module, soon. FWIW, my experimentation showed that getting rid of the hash itself is quite often *NOT* a worthwhile tradeof. Comparing keys and recomputing hashes is often quite expensive (e.g. for GROUP BY it's where the majority of time is spent after the simplehash patch). Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers