HNSW questions

2023-04-18 Thread Jonathan Ellis
adding identical vectors to a HNSW? It looks like when I supply 10 identical vectors, they all get added to the graph, but when I search for the nearest neighbors, I only get one of them in the result set. -- Jonathan Ellis co-founder, http://www.datastax.com @spyced

Re: HNSW questions

2023-04-20 Thread Jonathan Ellis
pare two vectors we need to use two independent sources so > that one doesn't overwrite this internal state when fetching the second > vector. > > Sorry I forgot the second question and can't see it on my phone. Brb > > On Tue, Apr 18, 2023, 10:55 PM Jonathan Ellis wrote: >

Re: HNSW questions

2023-04-20 Thread Jonathan Ellis
, 2023 at 1:37 PM Michael Sokolov wrote: > That class is intended for use by the Lucene index writer - it's not > designed as a general purpose class for re-use outside that context. > And IndexWriter writes documents to disk in bulk. > > On Wed, Apr 19, 2023 at 3:54 PM Jonath

Concurrent HNSW index

2023-04-27 Thread Jonathan Ellis
ther nodes normally," you don't really benefit from having computed the others previously. I am currently adding this to Cassandra as code in our repo, but my preference would be to upstream it. Is Lucene open to a pull request? -- Jonathan Ellis co-founder, http://www.datastax.com @spyced

Re: HNSW questions

2023-04-19 Thread Jonathan Ellis
can't see it on my phone. Brb > > On Tue, Apr 18, 2023, 10:55 PM Jonathan Ellis wrote: > >> HI all, a couple questions on how HNSW works: >> >> 1. What is driving the requirement for two copies of the input vectors? >> It looks like the RAVV implementations do

Re: HNSW questions

2023-04-23 Thread Jonathan Ellis
rface work differently; see > OffHeapByteVectorValues, which is representing vectors in the index > and implemented using I/O calls. > > If you shared some context about your interest here, we might be able > to help you better. > > On Thu, Apr 20, 2023 at 1:22 PM Jonathan Ellis w

Re: Concurrent HNSW index

2023-04-28 Thread Jonathan Ellis
, 2023 at 8:14 AM Jonathan Ellis wrote: > Great, I will work on squashing to get a clean PR. > > One thing I am struggling with is the RamUsageTester. Here is the > stacktrace: > https://gist.github.com/jbellis/20676b0e23f43751cbe8834a8def0d12 > > Apparently RamUsageTester

Re: Concurrent HNSW index

2023-04-28 Thread Jonathan Ellis
Dx86ZKLYNpI3gzMercM7BQ> | Github > <https://github.com/seaseltd> > > > On Thu, 27 Apr 2023 at 19:29, Jonathan Ellis wrote: > >> Hi all, >> >> I've created an HNSW index implementation that allows for concurrent >> build and querying. On my i9-

Concurrent HNSW index, take two

2023-07-06 Thread Jonathan Ellis
checking correctness of my estimates. * Added HnswGraph.addNode (with default unsupportedoperation) to document the shared expectations in one place. -- Jonathan Ellis co-founder, http://www.datastax.com @spyced

Fix for vector math precision

2023-06-09 Thread Jonathan Ellis
/12281 -- Jonathan Ellis co-founder, http://www.datastax.com @spyced

Mixed results with FINGER for HNSW search

2023-08-02 Thread Jonathan Ellis
ll@100. I did not see a massive difference in Finger's performance advantage either way. [3] See Algorithm 2 line 19 in the paper for when exact similarity is performed, corresponding to line 257 in HnswSearcher. -- Jonathan Ellis co-founder, http://www.datastax.com @spyced

Re: Vamana greedy search variant

2023-08-05 Thread Jonathan Ellis
n extracting code for external use, > potentially disrupting the indexing chain. > > Best regards, > Jim > > On Sat, 5 Aug 2023 at 22:47, Jonathan Ellis wrote: > >> Hi all, >> >> I put FINGER on pause to try out different graph construction methods. >> Vaman

Re: Fix for vector math precision

2023-06-20 Thread Jonathan Ellis
Instead of doing double math, new PR to limit vector components to smaller than 1E17 to prevent overflow: https://github.com/apache/lucene/pull/12373 On Fri, Jun 9, 2023 at 4:38 PM Jonathan Ellis wrote: > Hi all, > > I ran into a bug where the cosine of a large vector taken wi

Re: Conneting Lucene with ChatGPT Retrieval Plugin

2023-05-10 Thread Jonathan Ellis
I did track down a weird bug I was seeing to our cosine similarity returning NaN with high dimension vectors. Fix is here: https://github.com/apache/lucene/pull/12281 On Tue, May 9, 2023 at 12:15 PM Jonathan Ellis wrote: > I'm adding Lucene HNSW to Cassandra for vector search. One of my t

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Jonathan Ellis
ation Retrieval Applied > Consulting | Training | Open Source > > Website: Sease.io <http://sease.io/> > LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter > <https://twitter.com/seaseltd> | Youtube > <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github > <https://github.com/seaseltd> > -- Jonathan Ellis co-founder, http://www.datastax.com @spyced

Allowing tests to use multiple cores

2023-05-16 Thread Jonathan Ellis
that limitation [2]. Is there a best practice way to opt into multi-cores tests without this blunt hammer? [1] https://github.com/apache/lucene/pull/12254 [2] https://github.com/apache/lucene/pull/12254/commits/e6fbf0afb7da7af49a7a4fdbc578fde0da10d162 -- Jonathan Ellis co-founder, http://www.datastax.com

Re: How to create a local build that targets Java 11, when building with 17?

2023-05-05 Thread Jonathan Ellis
Actually my hack doesn't work, the manifest file changes but the .class files do not. On Fri, May 5, 2023 at 12:38 PM Jonathan Ellis wrote: > `./gradlew publishToMavenLocal` gives me Java 17 class files by default, > which surprises me since AFAIK 11 is still the minimum to run Lucene.

How to create a local build that targets Java 11, when building with 17?

2023-05-05 Thread Jonathan Ellis
Is there a cleaner way to do this? -- Jonathan Ellis co-founder, http://www.datastax.com @spyced

Re: How to create a local build that targets Java 11, when building with 17?

2023-05-05 Thread Jonathan Ellis
ION_17 > > Also, don't use the default gradle task created by convention; use this > one: > > ./gradlew mavenToLocal > > it's an alias but it publishes only a subset of relevant projects, not all > of them. > > Dawid > > On Fri, May 5, 2023 at 8:03 PM Jonath

Re: Conneting Lucene with ChatGPT Retrieval Plugin

2023-05-09 Thread Jonathan Ellis
cessfully and 1024 >> > seems to have been chosen as max dimension quite arbitrarily in the >> > first place, I think it should not be a problem to increase the max >> > dimension by a factor 1.5 or 2. >> > >> > WDYT? >> > >> > Thanks >&

Re: Conneting Lucene with ChatGPT Retrieval Plugin

2023-05-09 Thread Jonathan Ellis
have been chosen as max dimension quite arbitrarily in the >>> > first place, I think it should not be a problem to increase the max >>> > dimension by a factor 1.5 or 2. >>> > >>> > WDYT? >>> > >>> > Thanks >>> > >>> > Michael >>> > >>> > >>> > >>> > - >>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> > For additional commands, e-mail: dev-h...@lucene.apache.org >>> > >>> >>> >>> - >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >>> -- Jonathan Ellis co-founder, http://www.datastax.com @spyced

Re: HNSW questions

2023-05-09 Thread Jonathan Ellis
Tue, Apr 18, 2023, 10:55 PM Jonathan Ellis wrote: > >> HI all, a couple questions on how HNSW works: >> >> 1. What is driving the requirement for two copies of the input vectors? >> It looks like the RAVV implementations do shallow copies, so the vector >> from A