Concurrent HNSW index, take two

2023-07-06 Thread Jonathan Ellis
Hi all, I first published a concurrent HNSW PR in April, which turned out to be a bit premature. There was a lot of code churn as I fixed bugs and improved performance. Sorry about that! This code has been available as part of DataStax Astra’s public vector search preview for almost a month

Re: Concurrent HNSW index

2023-04-28 Thread Jonathan Ellis
Draft PR is posted here: https://github.com/apache/lucene/pull/12254 This depends on my PR to use HashMap in the non-concurrent OnHeapHnswGraph (because that PR updates the tests to not assume sorted order of nodes in a given level): https://github.com/apache/lucene/pull/12248 On Fri, Apr 28,

Re: Concurrent HNSW index

2023-04-28 Thread Jonathan Ellis
Great, I will work on squashing to get a clean PR. One thing I am struggling with is the RamUsageTester. Here is the stacktrace: https://gist.github.com/jbellis/20676b0e23f43751cbe8834a8def0d12 Apparently RamUsageTester tries to flip private fields to public so it can introspect them, but the

Re: Concurrent HNSW index

2023-04-28 Thread Alessandro Benedetti
That's great! And we were talking about this exactly here: https://github.com/apache/lucene/pull/12169 It would also help with the new token filter :) -- *Alessandro Benedetti* Director @ Sease Ltd. *Apache Lucene/Solr Committer* *Apache Solr PMC Member* e-mail:

Re: Concurrent HNSW index

2023-04-27 Thread Michael Wechner
+1 for a pull request Thanks Michael Am 27.04.23 um 20:53 schrieb Ishan Chattopadhyaya: +1, please contribute to Lucene. Thanks! On Thu, 27 Apr, 2023, 10:59 pm Jonathan Ellis, wrote: Hi all, I've created an HNSW index implementation that allows for concurrent build and

Re: Concurrent HNSW index

2023-04-27 Thread Ishan Chattopadhyaya
+1, please contribute to Lucene. Thanks! On Thu, 27 Apr, 2023, 10:59 pm Jonathan Ellis, wrote: > Hi all, > > I've created an HNSW index implementation that allows for concurrent build > and querying. On my i9-12900 (8 performance cores and 8 efficiency) I get > a bit less than 10x speedup of

Concurrent HNSW index

2023-04-27 Thread Jonathan Ellis
Hi all, I've created an HNSW index implementation that allows for concurrent build and querying. On my i9-12900 (8 performance cores and 8 efficiency) I get a bit less than 10x speedup of wall clock time for building and querying the "siftsmall" and "sift" datasets from