Dense union of doc IDs

2022-11-03 Thread Michael Froh
Hi, I was recently poking around in the createWeight implementation for MultiTermQueryConstantScoreWrapper to get to the bottom of some slow queries, and I realized that the worst-case performance could be pretty bad, but (maybe) possible to optimize for. Imagine if we have a segment with N docs

Re: HNSW and Multi-segments

2022-11-03 Thread Michael Sokolov
The way I think of this is that segmenting the graph will generally lead to higher recall and higher costs (at query time) for a given set of HNSW parameters. Indexing costs will tend to be lower for multiple segmented graphs. I don't think that increased irrelevant docs should be a concern since

HNSW and Multi-segments

2022-11-03 Thread MyCoy Z
Hi, Lucene Developers: I'm studying the HNSW source code and have some questions regarding Lucene's multi-segments and HNSW. First, some of my understanding: 1. While creating the index, when two segments are being merged, it could rebuild the HNSW graph based on the docs and vectors in the two