Re: Sub-Graphs in Hnsw

2025-06-05 Thread Michael Sokolov
Oh, thanks for pointing that out, I hadn't seen the issue: I think it's roughly the same idea, we were discussing off-line (Kaival joined our office in Boston recently). Maybe let's move the discussion to that issue and iterate there On Thu, Jun 5, 2025 at 2:44 PM Michael Froh wrote: > > I'm wond

Re: Sub-Graphs in Hnsw

2025-06-05 Thread Michael Froh
I'm wondering if this is the same idea that Kaival is proposing in https://github.com/apache/lucene/issues/14758 (Support multiple HNSW graphs backed by the same vectors). On Thu, Jun 5, 2025 at 11:32 AM Michael Sokolov wrote: > I do think there could be many interesting use cases for building >

Re: Sub-Graphs in Hnsw

2025-06-05 Thread Michael Sokolov
I do think there could be many interesting use cases for building multiple graphs from a single set of vectors. For example, one might want to sometimes search all the docs, sometimes search the one subset and other times another subset; baking the constraint into the graph construction would be l

Re: Sub-Graphs in Hnsw

2025-06-04 Thread Ravikumar Govindarajan
> > I wonder if you could influence the graph search by incorporating the > partition key (customer id?) to the vectors somehow? If this was done > well it should lead to a natural clustering of the graph. > I can explore further on this. Thanks for the pointers.. On Mon, Jun 2, 2025 at 11:14 PM

Re: Sub-Graphs in Hnsw

2025-06-02 Thread Michael Sokolov
I wonder if you could influence the graph search by incorporating the partition key (customer id?) to the vectors somehow? If this was done well it should lead to a natural clustering of the graph. On Mon, Jun 2, 2025 at 11:32 AM Ravikumar Govindarajan wrote: > > Hi Michael, > > The docs range co

Re: Sub-Graphs in Hnsw

2025-06-02 Thread Ravikumar Govindarajan
Hi Michael, The docs range could vary in extremes from few 10s to tens-of-thousands and in very heavy usage cases, 100k and above… in a single segment Filtered Hnsw like you said uses a single graph.., which could be better if designed as sub-graphs On Mon, 2 Jun 2025 at 5:42 PM, Michael Sokolo

Re: Sub-Graphs in Hnsw

2025-06-02 Thread Michael Sokolov
How many documents do you anticipate in a typical sub range? If it's in the hundreds or even low thousands you would be better off without hnsw. Instead you can use a function score query based on the vector distance. For larger numbers where hnsw becomes useful, you could try using filtered hnsw,