Oh, thanks for pointing that out, I hadn't seen the issue: I think
it's roughly the same idea, we were discussing off-line (Kaival joined
our office in Boston recently). Maybe let's move the discussion to
that issue and iterate there
On Thu, Jun 5, 2025 at 2:44 PM Michael Froh wrote:
>
> I'm wond
I'm wondering if this is the same idea that Kaival is proposing in
https://github.com/apache/lucene/issues/14758 (Support multiple HNSW graphs
backed by the same vectors).
On Thu, Jun 5, 2025 at 11:32 AM Michael Sokolov wrote:
> I do think there could be many interesting use cases for building
>
I do think there could be many interesting use cases for building
multiple graphs from a single set of vectors. For example, one might
want to sometimes search all the docs, sometimes search the one subset
and other times another subset; baking the constraint into the graph
construction would be l
>
> I wonder if you could influence the graph search by incorporating the
> partition key (customer id?) to the vectors somehow? If this was done
> well it should lead to a natural clustering of the graph.
>
I can explore further on this. Thanks for the pointers..
On Mon, Jun 2, 2025 at 11:14 PM
I wonder if you could influence the graph search by incorporating the
partition key (customer id?) to the vectors somehow? If this was done
well it should lead to a natural clustering of the graph.
On Mon, Jun 2, 2025 at 11:32 AM Ravikumar Govindarajan
wrote:
>
> Hi Michael,
>
> The docs range co
Hi Michael,
The docs range could vary in extremes from few 10s to tens-of-thousands
and in very heavy usage cases, 100k and above… in a single segment
Filtered Hnsw like you said uses a single graph.., which could be better if
designed as sub-graphs
On Mon, 2 Jun 2025 at 5:42 PM, Michael Sokolo
How many documents do you anticipate in a typical sub range? If it's in the
hundreds or even low thousands you would be better off without hnsw.
Instead you can use a function score query based on the vector distance.
For larger numbers where hnsw becomes useful, you could try using filtered
hnsw,