jpountz commented on PR #11860:
URL: https://github.com/apache/lucene/pull/11860#issuecomment-1382689973
For reference, there seems to be a 6-7% QPS drop on nightly benchmarks
associated with this change.
https://people.apache.org/~mikemccand/lucenebench/VectorSearch.html I think
it's fine
jpountz commented on PR #11860:
URL: https://github.com/apache/lucene/pull/11860#issuecomment-1315270376
No objections, I just wanted to make sure we agreed on what doing the same
thing as postings meant.
--
This is an automated message from the Apache Git Service.
To respond to the messa
jpountz commented on PR #11860:
URL: https://github.com/apache/lucene/pull/11860#issuecomment-1315266717
So vint-delta like short postings?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the spe
jpountz commented on PR #11860:
URL: https://github.com/apache/lucene/pull/11860#issuecomment-1315253148
I don't know much about HNSW but I would guess that this could be due to the
fact that nodes that map to vectors that are similar to one another will have a
similar set of neighbors, and
jpountz commented on PR #11860:
URL: https://github.com/apache/lucene/pull/11860#issuecomment-1315146472
In case it helps this discussion, I ran the following code to get a sense of
the savings we could get assuming `2^24` vectors that are not clustered and 32
neighbors per vector.
`
jpountz commented on PR #11860:
URL: https://github.com/apache/lucene/pull/11860#issuecomment-1305835975
I guess that encoding each block with a different number of bits per value
would mostly help if node IDs are somewhat clustered so that the set of
neighbors to a given node would be clos