On Wed, 31 May 2023 at 18:40, Joel Jacobson <j...@compiler.org> wrote: > > On Wed, May 31, 2023, at 16:53, Tomas Vondra wrote: > > I think this needs a better explanation - what exactly is a hashset in > > this context? Something like an array with a hash for faster lookup of > > unique elements, or what? > > In this context, by "hashset" I am indeed referring to a data structure > similar > to an array, where each element would be unique, and lookups would be faster > than arrays for larger number of elements due to hash-based lookups. > > This data structure would store identifiers (IDs) of the nodes, not the > complete > nodes themselves.
Have you looked at roaring bitmaps? There is a pg_roaringbitmap extension [1] already available that offers very fast unions, intersections and membership tests over integer sets. I used it to get some pretty impressive performance results for faceting search on large document sets. [2] Depending on the graph fan-outs and operations it might make sense in the graph use case. For small sets it's probably not too different from the intarray extension in contrib. But for finding intersections over large sets (i.e. a join) it's very-very fast. If the workload is traversal heavy it might make sense to even cache materialized transitive closures up to some depth (a friend-of-a-friend list). Roaring bitmaps only support int4 right now, but that is easily fixable. And they need a relatively dense ID space to get the performance boost, which seems essential to the approach. The latter issue means that it can't be easily dropped into GIN or B-tree indexes for ctid storage. [1] https://github.com/ChenHuajun/pg_roaringbitmap [2] https://github.com/cybertec-postgresql/pgfaceting -- Ants Aasma www.cybertec-postgresql.com