[GitHub] [lucene] jpountz commented on pull request #11860: GITHUB-11830 Better optimize storage for vector connections

2023-01-14 Thread GitBox
jpountz commented on PR #11860: URL: https://github.com/apache/lucene/pull/11860#issuecomment-1382689973 For reference, there seems to be a 6-7% QPS drop on nightly benchmarks associated with this change. https://people.apache.org/~mikemccand/lucenebench/VectorSearch.html I think it's fine

[GitHub] [lucene] jpountz commented on pull request #11860: GITHUB-11830 Better optimize storage for vector connections

2022-11-15 Thread GitBox
jpountz commented on PR #11860: URL: https://github.com/apache/lucene/pull/11860#issuecomment-1315270376 No objections, I just wanted to make sure we agreed on what doing the same thing as postings meant. -- This is an automated message from the Apache Git Service. To respond to the messa

[GitHub] [lucene] jpountz commented on pull request #11860: GITHUB-11830 Better optimize storage for vector connections

2022-11-15 Thread GitBox
jpountz commented on PR #11860: URL: https://github.com/apache/lucene/pull/11860#issuecomment-1315266717 So vint-delta like short postings? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [lucene] jpountz commented on pull request #11860: GITHUB-11830 Better optimize storage for vector connections

2022-11-15 Thread GitBox
jpountz commented on PR #11860: URL: https://github.com/apache/lucene/pull/11860#issuecomment-1315253148 I don't know much about HNSW but I would guess that this could be due to the fact that nodes that map to vectors that are similar to one another will have a similar set of neighbors, and

[GitHub] [lucene] jpountz commented on pull request #11860: GITHUB-11830 Better optimize storage for vector connections

2022-11-15 Thread GitBox
jpountz commented on PR #11860: URL: https://github.com/apache/lucene/pull/11860#issuecomment-1315146472 In case it helps this discussion, I ran the following code to get a sense of the savings we could get assuming `2^24` vectors that are not clustered and 32 neighbors per vector. `

[GitHub] [lucene] jpountz commented on pull request #11860: GITHUB-11830 Better optimize storage for vector connections

2022-11-07 Thread GitBox
jpountz commented on PR #11860: URL: https://github.com/apache/lucene/pull/11860#issuecomment-1305835975 I guess that encoding each block with a different number of bits per value would mostly help if node IDs are somewhat clustered so that the set of neighbors to a given node would be clos