On Wed, 31 May 2023 at 18:40, Joel Jacobson <j...@compiler.org> wrote:
>
> On Wed, May 31, 2023, at 16:53, Tomas Vondra wrote:
> > I think this needs a better explanation - what exactly is a hashset in
> > this context? Something like an array with a hash for faster lookup of
> > unique elements, or what?
>
> In this context, by "hashset" I am indeed referring to a data structure 
> similar
> to an array, where each element would be unique, and lookups would be faster
> than arrays for larger number of elements due to hash-based lookups.
>
> This data structure would store identifiers (IDs) of the nodes, not the 
> complete
> nodes themselves.

Have you looked at roaring bitmaps? There is a pg_roaringbitmap
extension [1] already available that offers very fast unions,
intersections and membership tests over integer sets. I used it to get
some pretty impressive performance results for faceting search on
large document sets. [2]

Depending on the graph fan-outs and operations it might make sense in
the graph use case. For small sets it's probably not too different
from the intarray extension in contrib. But for finding intersections
over large sets (i.e. a join) it's very-very fast. If the workload is
traversal heavy it might make sense to even cache materialized
transitive closures up to some depth (a friend-of-a-friend list).

Roaring bitmaps only support int4 right now, but that is easily
fixable. And they need a relatively dense ID space to get the
performance boost, which seems essential to the approach. The latter
issue means that it can't be easily dropped into GIN or B-tree indexes
for ctid storage.

[1] https://github.com/ChenHuajun/pg_roaringbitmap
[2] https://github.com/cybertec-postgresql/pgfaceting
-- 
Ants Aasma
www.cybertec-postgresql.com


Reply via email to