On Mon, Jun 5, 2023, at 01:44, Tomas Vondra wrote: > Anyway, I hacked together a trivial set backed by an open addressing > hash table: > > https://github.com/tvondra/hashset > > It's super-basic, providing just some bare minimum of functions, but > hopefully good enough for experiments. > > - hashset data type - hash table in varlena > - hashset_init - create new hashset > - hashset_add / hashset_contains - add value / check membership > - hashset_merge - merge two hashsets > - hashset_count - count elements > - hashset_to_array - return > - hashset(int) aggregate > > This allows rewriting the recursive query like this: > > WITH RECURSIVE friends_of_friends AS ( ...
Nice! I get similar results, 737 ms (hashset) vs 1809 ms (array_agg). I think if you just add one more hashset function, it will be a win against roaringbitmap, which is 400 ms. The missing function is an agg that takes hashset as input and returns hashset, similar to roaringbitmap's rb_or_agg(). With such a function, we could add an adjacent nodes hashset column to the `nodes` table, which would eliminate the need to scan the edges table for graph traversal: We could then benchmark roaringbitmap against hashset querying the same table: CREATE TABLE users AS SELECT from_node AS id, rb_build_agg(to_node) AS friends_roaringbitmap, hashset(to_node) AS friends_hashset FROM edges GROUP BY 1; /Joel