On Mon, Jun 5, 2023, at 01:44, Tomas Vondra wrote:
> Anyway, I hacked together a trivial set backed by an open addressing
> hash table:
>
>   https://github.com/tvondra/hashset
>
> It's super-basic, providing just some bare minimum of functions, but
> hopefully good enough for experiments.
>
> - hashset data type - hash table in varlena
> - hashset_init - create new hashset
> - hashset_add / hashset_contains - add value / check membership
> - hashset_merge - merge two hashsets
> - hashset_count - count elements
> - hashset_to_array - return
> - hashset(int) aggregate
>
> This allows rewriting the recursive query like this:
>
>     WITH RECURSIVE friends_of_friends AS (
...

Nice! I get similar results, 737 ms (hashset) vs 1809 ms (array_agg).

I think if you just add one more hashset function, it will be a win against 
roaringbitmap, which is 400 ms.

The missing function is an agg that takes hashset as input and returns hashset, 
similar to roaringbitmap's rb_or_agg().

With such a function, we could add an adjacent nodes hashset column to the 
`nodes` table, which would eliminate the need to scan the edges table for graph 
traversal:

We could then benchmark roaringbitmap against hashset querying the same table:

CREATE TABLE users AS
SELECT
    from_node AS id,
    rb_build_agg(to_node) AS friends_roaringbitmap,
    hashset(to_node) AS friends_hashset
FROM edges
GROUP BY 1;

/Joel


Reply via email to