Hi,
I was recently poking around in the createWeight implementation for
MultiTermQueryConstantScoreWrapper to get to the bottom of some slow
queries, and I realized that the worst-case performance could be pretty
bad, but (maybe) possible to optimize for.
Imagine if we have a segment with N docs
The way I think of this is that segmenting the graph will generally
lead to higher recall and higher costs (at query time) for a given set
of HNSW parameters. Indexing costs will tend to be lower for multiple
segmented graphs. I don't think that increased irrelevant docs should
be a concern since
Hi, Lucene Developers:
I'm studying the HNSW source code and have some questions regarding
Lucene's multi-segments and HNSW.
First, some of my understanding:
1. While creating the index, when two segments are being merged, it could
rebuild the HNSW graph based on the docs and vectors in the two