Re: HNSW questions

2023-05-11 Thread Michael Sokolov
Yes, it's up to the application. And it is definitely a pathological case when it happens; https://github.com/apache/lucene/issues/11626 On Tue, May 9, 2023 at 1:30 PM Jonathan Ellis wrote: > > I don't see anything to make sure vectors are unique in IndexingChain down to > FieldWriter, is that h

Re: Index ordinal data in the taxonomy

2023-05-11 Thread Shai Erera
Hi Stefan, This sounds interesting and useful. It's like static scores for Lucene documents, only that we will apply them to ordinals. Since I assume it's not a very common use case though, do you know if this new functionality affects existing use cases? For example, will it change the API in non

Re: HNSW questions

2023-05-11 Thread Michael McCandless
I think the concurrency is across segments or slices (= multiple small segments)? I.e. "thread per slice" model, not multiple threads in one slice. But your cool PR would fix that limitation! https://github.com/apache/lucene/pull/12254 Mike McCandless http://blog.mikemccandless.com On Sun, Ap

Fix version / milestones

2023-05-11 Thread Michael McCandless
Hi Team, [spinoff from this java-user thread: https://markmail.org/thread/mmx22s7lysxqh6wm] In the good old Jira days, when we resolved an issue, the workflow encouraged us to also mark the Fix Version that this fix will be released in. This metadata is very helpful to future users wanting to kn

Re: Dimensions Limit for KNN vectors - Next Steps

2023-05-11 Thread Uwe Schindler
That's actually a good idea. +1 Am 10.05.2023 um 09:22 schrieb Bruno Roustant: *Proposed option:* Move the max dimension limit lower level to a HNSW specific implementation. Once there, this limit would not bind any other potential vector engine alternative/evolution. *Motivation:* There see

Index ordinal data in the taxonomy

2023-05-11 Thread Stefan Vodita
Hi everyone, I work on the Lucene product search team at Amazon. We’ve been considering indexing scoring signals for ordinals into the taxonomy, which could reduce index size for some use-cases. Example Let's consider a library of research papers, where each paper is represented by a Lucene docu