adding identical vectors to a HNSW?
It looks like when I supply 10 identical vectors, they all get added to the
graph, but when I search for the nearest neighbors, I only get one of them
in the result set.
--
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced
pare two vectors we need to use two independent sources so
> that one doesn't overwrite this internal state when fetching the second
> vector.
>
> Sorry I forgot the second question and can't see it on my phone. Brb
>
> On Tue, Apr 18, 2023, 10:55 PM Jonathan Ellis wrote:
>
, 2023 at 1:37 PM Michael Sokolov wrote:
> That class is intended for use by the Lucene index writer - it's not
> designed as a general purpose class for re-use outside that context.
> And IndexWriter writes documents to disk in bulk.
>
> On Wed, Apr 19, 2023 at 3:54 PM Jonath
ther nodes normally," you
don't really benefit from having computed the others previously.
I am currently adding this to Cassandra as code in our repo, but my
preference would be to upstream it. Is Lucene open to a pull request?
--
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced
can't see it on my phone. Brb
>
> On Tue, Apr 18, 2023, 10:55 PM Jonathan Ellis wrote:
>
>> HI all, a couple questions on how HNSW works:
>>
>> 1. What is driving the requirement for two copies of the input vectors?
>> It looks like the RAVV implementations do
rface work differently; see
> OffHeapByteVectorValues, which is representing vectors in the index
> and implemented using I/O calls.
>
> If you shared some context about your interest here, we might be able
> to help you better.
>
> On Thu, Apr 20, 2023 at 1:22 PM Jonathan Ellis w
, 2023 at 8:14 AM Jonathan Ellis wrote:
> Great, I will work on squashing to get a clean PR.
>
> One thing I am struggling with is the RamUsageTester. Here is the
> stacktrace:
> https://gist.github.com/jbellis/20676b0e23f43751cbe8834a8def0d12
>
> Apparently RamUsageTester
Dx86ZKLYNpI3gzMercM7BQ> | Github
> <https://github.com/seaseltd>
>
>
> On Thu, 27 Apr 2023 at 19:29, Jonathan Ellis wrote:
>
>> Hi all,
>>
>> I've created an HNSW index implementation that allows for concurrent
>> build and querying. On my i9-
checking correctness of my estimates.
* Added HnswGraph.addNode (with default unsupportedoperation) to document
the shared expectations in one place.
--
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced
/12281
--
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced
ll@100. I did not see a massive difference in Finger's
performance advantage either way.
[3] See Algorithm 2 line 19 in the paper for when exact similarity is
performed, corresponding to line 257 in HnswSearcher.
--
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced
n extracting code for external use,
> potentially disrupting the indexing chain.
>
> Best regards,
> Jim
>
> On Sat, 5 Aug 2023 at 22:47, Jonathan Ellis wrote:
>
>> Hi all,
>>
>> I put FINGER on pause to try out different graph construction methods.
>> Vaman
Instead of doing double math, new PR to limit vector components to smaller
than 1E17 to prevent overflow: https://github.com/apache/lucene/pull/12373
On Fri, Jun 9, 2023 at 4:38 PM Jonathan Ellis wrote:
> Hi all,
>
> I ran into a bug where the cosine of a large vector taken wi
I did track down a weird bug I was seeing to our cosine similarity
returning NaN with high dimension vectors. Fix is here:
https://github.com/apache/lucene/pull/12281
On Tue, May 9, 2023 at 12:15 PM Jonathan Ellis wrote:
> I'm adding Lucene HNSW to Cassandra for vector search. One of my t
ation Retrieval Applied
> Consulting | Training | Open Source
>
> Website: Sease.io <http://sease.io/>
> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
> <https://twitter.com/seaseltd> | Youtube
> <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
> <https://github.com/seaseltd>
>
--
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced
that limitation
[2]. Is there a best practice way to opt into multi-cores tests without
this blunt hammer?
[1] https://github.com/apache/lucene/pull/12254
[2]
https://github.com/apache/lucene/pull/12254/commits/e6fbf0afb7da7af49a7a4fdbc578fde0da10d162
--
Jonathan Ellis
co-founder, http://www.datastax.com
Actually my hack doesn't work, the manifest file changes but the .class
files do not.
On Fri, May 5, 2023 at 12:38 PM Jonathan Ellis wrote:
> `./gradlew publishToMavenLocal` gives me Java 17 class files by default,
> which surprises me since AFAIK 11 is still the minimum to run Lucene.
Is there a cleaner way to do this?
--
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced
ION_17
>
> Also, don't use the default gradle task created by convention; use this
> one:
>
> ./gradlew mavenToLocal
>
> it's an alias but it publishes only a subset of relevant projects, not all
> of them.
>
> Dawid
>
> On Fri, May 5, 2023 at 8:03 PM Jonath
cessfully and 1024
>> > seems to have been chosen as max dimension quite arbitrarily in the
>> > first place, I think it should not be a problem to increase the max
>> > dimension by a factor 1.5 or 2.
>> >
>> > WDYT?
>> >
>> > Thanks
>&
have been chosen as max dimension quite arbitrarily in the
>>> > first place, I think it should not be a problem to increase the max
>>> > dimension by a factor 1.5 or 2.
>>> >
>>> > WDYT?
>>> >
>>> > Thanks
>>> >
>>> > Michael
>>> >
>>> >
>>> >
>>> > -
>>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> > For additional commands, e-mail: dev-h...@lucene.apache.org
>>> >
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>
--
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced
Tue, Apr 18, 2023, 10:55 PM Jonathan Ellis wrote:
>
>> HI all, a couple questions on how HNSW works:
>>
>> 1. What is driving the requirement for two copies of the input vectors?
>> It looks like the RAVV implementations do shallow copies, so the vector
>> from A
22 matches
Mail list logo