Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-23 Thread Alessandro Benedetti
Closing the poll after one week, these are the results: Option 2-4: 9 votes make the limit configurable, potentially moving the limit to the appropriate place Option 3: 5 votes keep it as it is (1024) but move it lower level in HNSW-specific implementation Option 1: 0 votes keep it as it is

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-22 Thread Bruno Roustant
I vote for option 3. Then with a follow up work to have a simple extension codec in the "codecs" package which is 1- not backward compatible, and 2- has a higher or configurable limit. That way users can directly use this codec without any additional code.

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-19 Thread Alessandro Benedetti
Thanks to everyone involved so far! I confirm that a proper subject should have been [POLL] rather than [VOTE], apologies for the confusion. We are in the middle of the poll and this is the summary so far (ordered by preference): Option 2-4: 9 votes make the limit configurable, potentially

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-18 Thread Nicholas Knize
Difficult to keep up with this topic when it's spread across issues, PRs, and email lists. My poll response is option 3. -1 to option 2, I think the configuration should be moved to the HNSW specific implementation. At this point of technical maturity, it doesn't make sense (to me) to have the

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-18 Thread Michael Wechner
Am 18.05.23 um 12:22 schrieb Michael McCandless: I love all the energy and passion going into debating all the ways to poke at this limit, but please let's also spend some of this passion on actually improving the scalability of our aKNN implementation!  E.g. Robert opened an exciting 

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-18 Thread Michael Wechner
It is basically the code which Michael Sokolov posted at https://markmail.org/message/kf4nzoqyhwacb7ri except  - that I have replaced KnnVectorField by KnnFloatVectorField, because KnnVectorField is deprecated.  - that I don't hard code the  dimension as 2048 and the metric as EUCLIDEAN, but

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-18 Thread Michael McCandless
This isn't really a VOTE (no specific code change is being proposed), but rather a poll? Anyway, I would prefer Option 3: put the limit check into the HNSW algorithm itself. This is the right place for the limit check, since HNSW has its own scaling behaviour. It might have other limits, like

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-18 Thread Alessandro Benedetti
That's great and a good plan B, but let's try to focus this thread of collecting votes for a week (let's keep discussions on the nice PR opened by David or the discussion thread we have in the mailing list already :) On Thu, 18 May 2023, 10:10 Ishan Chattopadhyaya, wrote: > That sounds

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-18 Thread Ishan Chattopadhyaya
That sounds promising, Michael. Can you share scripts/steps/code to reproduce this? On Thu, 18 May, 2023, 1:16 pm Michael Wechner, wrote: > I just implemented it and tested it with OpenAI's text-embedding-ada-002, > which is using 1536 dimensions and it works very fine :-) > > Thanks > >

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-18 Thread Michael Wechner
I just implemented it and tested it with OpenAI's text-embedding-ada-002, which is using 1536 dimensions and it works very fine :-) Thanks Michael Am 18.05.23 um 00:29 schrieb Michael Wechner: IIUC KnnVectorField is deprecated and one is supposed to use KnnFloatVectorField when using

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-17 Thread Michael Wechner
IIUC KnnVectorField is deprecated and one is supposed to use KnnFloatVectorField when using float as vector values, right? Am 17.05.23 um 16:41 schrieb Michael Sokolov: see https://markmail.org/message/kf4nzoqyhwacb7ri On Wed, May 17, 2023 at 10:09 AM David Smiley wrote: > easily be

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-17 Thread David Smiley
Thanks Michael for sharing your code snippet on how to circumvent the limit. My reaction to this is the same as Alessandro. I just created a PR to make the limit configurable: https://github.com/apache/lucene/pull/12306 If there is to be a veto presented to the PR, it should include technical

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-17 Thread Michael Wechner
I try to better understand the code, so IIUC vector MAX_DIMENSIONS is currently used inside lucene/core/src/java/org/apache/lucene/document/FieldType.java lucene/core/src/java/org/apache/lucene/document/KnnFloatVectorField.java

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-17 Thread Mayya Sharipova
Alessandro, Thanks for raising the code of conduct; it is very discouraging and intimidating to participate in discussions where such language is used especially by senior members. Michael S., thanks for your suggestion and that's what we used in Elasticsearch to raise dims limit, and Alessandro,

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-17 Thread Alessandro Benedetti
Thanks, Michael, that example backs even more strongly the need of cleaning it up and making the limit configurable without the need for custom field types I guess (I was taking a look at the code again, and it seems the limit is also checked twice: in

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-17 Thread Michael Sokolov
see https://markmail.org/message/kf4nzoqyhwacb7ri On Wed, May 17, 2023 at 10:09 AM David Smiley wrote: > > easily be circumvented by a user > > This is a revelation to me and others, if true. Michael, please then > point to a test or code snippet that shows the Lucene user community what >

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-17 Thread David Smiley
> easily be circumvented by a user This is a revelation to me and others, if true. Michael, please then point to a test or code snippet that shows the Lucene user community what they want to see so they are unblocked from their explorations of vector search. ~ David Smiley Apache Lucene/Solr

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-17 Thread Michael Sokolov
I think I've said before on this list we don't actually enforce the limit in any way that can't easily be circumvented by a user. The codec already supports any size vector - it doesn't impose any limit. The way the API is written you can *already today* create an index with max-int sized vectors

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-17 Thread Robert Muir
As a reminder this isn't the Disney Plus channel and I'll use strong language if I fucking want to. On Wed, May 17, 2023, 4:45 AM Alessandro Benedetti wrote: > Robert, > A gentle reminder of the > https://www.apache.org/foundation/policies/conduct.html. > I've read many e-mails about this

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-17 Thread Alessandro Benedetti
Robert, A gentle reminder of the https://www.apache.org/foundation/policies/conduct.html. I've read many e-mails about this topic that ended up in a tone that is not up to the standard of a healthy community. To be specific and pragmatic how you addressed Gus here, how you addressed the rest of

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread David Smiley
We agree backwards compatibility with the index should be maintained and that checkIndex should work. And we agree on a number of other things, but I want to focus on configurability. As long as the index contains the number of dimensions actually used in a specific segment & field, why couldn't

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Gus Heck
Hi Robert, If you read the issue I opened more carefully you'll see I had all the service loading stuff sorted just fine. It's the silent eating of the security exceptions by URLClassPath that I think is a useful thing to point out. If anything, that ticket is more about being surprised by

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Robert Muir
My problem is that it impacts the default codec which is supported by our backwards compatibility policy for many years. We can't just let the user determine backwards compatibility with a sysprop. how will checkindex work? We have to have bounds and also allow for more performant implementations

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread David Smiley
Robert, I have not heard from you (or anyone) an argument against System property based configurability (as I described in Option 4 via a System property). Uwe notes wisely some care must be taken to ensure it actually works. Sure, of course. What concerns do you have with this? ~ David Smiley

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Robert Muir
by the way, i agree with the idea to MOVE THE LIMIT UNCHANGED to the hsnw-specific code. This way, someone can write alternative codec with vectors using some other completely different approach that incorporates a different more appropriate limit (maybe lower, maybe higher) depending upon their

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Robert Muir
Gus, I think i explained myself multiple times on issues and in this thread. the performance is unacceptable, everyone knows it, but nobody is talking about. I don't need to explain myself time and time again here. You don't seem to understand the technical issues (at least you sure as fuck don't

RE: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Pandu Kerr
Hi all, Great to have this discussion! My votes are for 2 and 4! Best, Pandu On 2023/05/16 08:50:24 Alessandro Benedetti wrote: > Hi all, > we have finalized all the options proposed by the community and we are > ready to vote for the preferred one and then proceed with the > implementation.

RE: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Pandu Kerr
Hi all, Great to have this discussion! My votes are for 2 and 4! Best, Pandu On 2023/05/16 08:50:24 Alessandro Benedetti wrote: > Hi all, > we have finalized all the options proposed by the community and we are > ready to vote for the preferred one and then proceed with the > implementation.

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Jonathan Ellis
My non-binding vote: Option 2 = Option 4 > Option 1 > Option 3 Explanation: Lucene's somewhat arbitrary limit of 1024 does not currently affect the raw, low-level HNSW, which is what I am plugging into Cassandra. The only option that would break this code is option 3. P.S. I mentioned this in

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Alessandro Benedetti
Even if the options can be basically summarised in two groups: make it configurable VS not making it configurable and leave it be, when I collected the options from people I ended up with these four and I didn't want to collapse any of them (potentially making the proposer feel diminished).

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Gus Heck
Actually, I had wondered if this is a proper vote thread or not, normally those are yes/no on a single option. On Tue, May 16, 2023 at 10:47 AM Alessandro Benedetti wrote: > Hi Marcus, > I am afraid at this stage Robert's opinion counts just as any other > opinion, a single vote for option 1. >

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Alessandro Benedetti
Hi Marcus, I am afraid at this stage Robert's opinion counts just as any other opinion, a single vote for option 1. We are collecting a community's feedback here, we are not changing any code nor voting for a yes/no. Once the voting is finished, we'll operate an action depending on the community's

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Marcus Eagan
Given that Robert has put in his veto, aren’t we clear on what we need to do for him to change his mind? He’s been pretty clear and the rules of veto are cut and dry. Most of the people that have contributed to kNN vectors recently are not even on the thread. I think improving the feature should

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Houston Putman
+1 on the combination of #3 and #4. Also good things to make sure of Uwe, thanks for calling those out. (Especially about the limit only being used on write, not on read). - Houston On Tue, May 16, 2023 at 9:57 AM Uwe Schindler wrote: > I agree with Dawid, > > I am +1 for those two options in

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Uwe Schindler
I agree with Dawid, I am +1 for those two options in combination: * option 3 (make limit an HNSW specific thing). New formats may use other limits (lower or higher). * option 4 (make a system property with HNSW prefix). Adding the system property must be done in same way like new

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Dawid Weiss
I'm for option 3 (limit at algorithm level), with the default there tunable via property (option 4). I understand Robert's concerns and I'd love to contribute a faster implementation but the reality is - I can't do it at the moment. I feel like experiments are good though and we shouldn't just

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Benjamin Trent
My vote is for option 3. Prevents Lucene from having the limit increased. Allows others who implement a different codec to set a limit of their choosing. Though I don't know the historical reasons for putting specific configuration items at the codec level. This limit is performance related and

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Michael Wechner
+1 to Gus' reply. I think that Robert's veto or anyone else's veto is fair enough, but I also think that anyone who is vetoing should be very clear about the objectives / goals to be achieved, in order to get a +1. If no clear objectives / goals can be defined and agreed on, then the whole

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Gus Heck
Robert, Can you explain in clear technical terms the standard that must be met for performance? A benchmark that must run in X time on Y hardware for example (and why that test is suitable)? Or some other reproducible criteria? So far I've heard you give an *opinion* that it's unusable, but

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Michael Wechner
my non-binding vote goes to Option 2 resp. Option 4 Thanks Michael Wechner Am 16.05.23 um 10:51 schrieb Alessandro Benedetti: My vote goes to *Option 4*. -- *Alessandro Benedetti* Director @ Sease Ltd. /Apache Lucene/Solr Committer/ /Apache Solr PMC Member/ e-mail:

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Robert Muir
i still feel -1 (veto) on increasing this limit. sending more emails does not change the technical facts or make the veto go away. On Tue, May 16, 2023 at 4:50 AM Alessandro Benedetti wrote: > Hi all, > we have finalized all the options proposed by the community and we are > ready to vote for

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Alessandro Benedetti
For simplicity's sake, let's consider Option 2 and 4 as equivalent as they are not mutually exclusive and just differ on a minor implementation detail. On Tue, 16 May 2023, 10:24 Alessandro Benedetti, wrote: > Option 4 also aim to refactor the limit in an appropriate place for the > code (short

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Alessandro Benedetti
Option 4 also aim to refactor the limit in an appropriate place for the code (short answer is Yes, implementation details) Cheers On Tue, 16 May 2023, 10:04 Michael Wechner, wrote: > Hi Alessandro > > Thank you very much for summarizing and starting the vote. > > I am not sure whether I really

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Michael Wechner
Hi Alessandro Thank you very much for summarizing and starting the vote. I am not sure whether I really understand the difference between Option 2 and Option 4, or is it just about implementation details? Thanks Michael Am 16.05.23 um 10:50 schrieb Alessandro Benedetti: Hi all, we have

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Alessandro Benedetti
My vote goes to *Option 4*. -- *Alessandro Benedetti* Director @ Sease Ltd. *Apache Lucene/Solr Committer* *Apache Solr PMC Member* e-mail: a.benede...@sease.io *Sease* - Information Retrieval Applied Consulting | Training | Open Source Website: Sease.io