Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local time)

2022-08-07 Thread Glen Newton
Thank-you.

Glen

On Sat, 6 Aug 2022 at 23:46, Tomoko Uchida 
wrote:

> Hi Glen,
> I verified your Jira/GitHub usernames and added a mapping.
>
> https://github.com/apache/lucene-jira-archive/commit/ae78d583b40f5bafa1f8ee09854294732dbf530b
>
> Tomoko
>
>
> 2022年8月7日(日) 3:37 Glen Newton :
>
> > jira: gnewton
> > github: gnewton  (github.com/gnewton)
> >
> > Thanks,
> > Glen
> >
> >
> >
> > On Sat, 6 Aug 2022 at 14:11, Tomoko Uchida  >
> > wrote:
> >
> > > Hi everyone.
> > >
> > > I wanted to let you know that we'll extend the deadline until the date
> > the
> > > migration is started (the date is not fixed yet).
> > > Please let us know your Jira/Github usernames if you don't see
> mapping(s)
> > > for your account in this file:
> > >
> > >
> >
> https://github.com/apache/lucene-jira-archive/blob/main/migration/mappings-data/account-map.csv.20220722.verified
> > >
> > > Tomoko
> > >
> > >
> > > 2022年8月7日(日) 1:36 Baris Kazar :
> > >
> > > > Thank You Thank You
> > > > Best regards
> > > > 
> > > > From: Michael McCandless 
> > > > Sent: Saturday, August 6, 2022 11:29:25 AM
> > > > To: Baris Kazar 
> > > > Cc: java-user@lucene.apache.org 
> > > > Subject: Re: [HELP] Link your Apache Lucene Jira and GitHub account
> ids
> > > > before Thursday August 4 midnight (in your local time)
> > > >
> > > > OK done:
> > > >
> > >
> >
> https://github.com/apache/lucene-jira-archive/commit/13fa4cb46a1a6d609448240e4f66c263da8b3fd1
> > > > <
> > > >
> > >
> >
> https://urldefense.com/v3/__https://github.com/apache/lucene-jira-archive/commit/13fa4cb46a1a6d609448240e4f66c263da8b3fd1__;!!ACWV5N9M2RV99hQ!OJffdSKrjdfY7VYGcAVGsx4rKHPICvgac4eOcXOf1fnT7u9fJ2RSu9toYPgowHx72UC33Ixg1s1BLKR6GBFgnw$
> > > > >
> > > >
> > > > Mike McCandless
> > > >
> > > > http://blog.mikemccandless.com<
> > > >
> > >
> >
> https://urldefense.com/v3/__http://blog.mikemccandless.com__;!!ACWV5N9M2RV99hQ!OJffdSKrjdfY7VYGcAVGsx4rKHPICvgac4eOcXOf1fnT7u9fJ2RSu9toYPgowHx72UC33Ixg1s1BLKQULWvYcw$
> > > > >
> > > >
> > > >
> > > > On Sat, Aug 6, 2022 at 10:29 AM Baris Kazar  > > > > wrote:
> > > > I think so.
> > > > Best regards
> > > > 
> > > > From: Michael McCandless  > > > luc...@mikemccandless.com>>
> > > > Sent: Saturday, August 6, 2022 10:12 AM
> > > > To: java-user@lucene.apache.org
> <
> > > > java-user@lucene.apache.org>
> > > > Cc: Baris Kazar  baris.ka...@oracle.com
> > >>
> > > > Subject: Re: [HELP] Link your Apache Lucene Jira and GitHub account
> ids
> > > > before Thursday August 4 midnight (in your local time)
> > > >
> > > > Thanks Baris,
> > > >
> > > > And your Jira ID is bkazar right?
> > > >
> > > > Mike
> > > >
> > > > On Sat, Aug 6, 2022 at 10:05 AM Baris Kazar  > > > > wrote:
> > > > My github username is bmkazar
> > > > can You please register me?
> > > > Best regards
> > > > 
> > > > From: Michael McCandless  > > > luc...@mikemccandless.com>>
> > > > Sent: Saturday, August 6, 2022 6:05:51 AM
> > > > To: d...@lucene.apache.org <
> > > > d...@lucene.apache.org>
> > > > Cc: Lucene Users  > > > java-user@lucene.apache.org>>; java-dev  > > > >
> > > > Subject: Re: [HELP] Link your Apache Lucene Jira and GitHub account
> ids
> > > > before Thursday August 4 midnight (in your local time)
> > > >
> > > > Hi Adam, I added your linked accounts here:
> > > >
> > > >
> > >
> >
> https://urldefense.com/v3/__https://github.com/apache/lucene-jira-archive/commit/c228cb184c073f4b96cd68d45a000cf390455b7c__;!!ACWV5N9M2RV99hQ!KNwyR7RuqeuKpyzEemagEZzGRGtdqjpE-OWaDfjjyZVHJ-zgsGLyYJhZ7ZWJCI1NrWR6H4DYdMbB8nLk1DO04g$
> > > >
> > > > And Tomoko added Rushabh's linked accounts here:
> > > >
> > > >
> > > >
> > >
> >
> https://urldefense.com/v3/__https://github.com/apache/lucene-jira-archive/commit/6f9501ec68792c1b287e93770f7a9dfd351b86fb__;!!ACWV5N9M2RV99hQ!KNwyR7RuqeuKpyzEemagEZzGRGtdqjpE-OWaDfjjyZVHJ-zgsGLyYJhZ7ZWJCI1NrWR6H4DYdMbB8nITwUFX0A$
> > > >
> > > > Keep the linked accounts coming!
> > > >
> > > > Mike
> > > >
> > > > On Thu, Aug 4, 2022 at 7:02 PM Rushabh Shah
> > > > mailto:rushabh.s...@salesforce.com
> > > >.invalid>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > > My mapping is:
> > > > > JiraName,GitHubAccount,JiraDispName
> > > > > shahrs87, shahrs87, Rushabh Shah
> > > > >
> > > > > Thank you Tomoko and Mike for all of your hard work.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Sun, Jul 31, 2022 at 3:08 AM Michael McCandless <
> > > > > luc...@mikemccandless.com>
> wrote:
> > > > >
> > > > >> Hello Lucene users, contributors and developers,
> > > > >>
> > > > >> If you have used Lucene's Jira and you have a GitHub account as
> > well,
> > > > >> please check whether your user id m

Raising the Value of MAX_DIMENSIONS of Vector Values

2022-08-07 Thread Marcus Eagan
Hi Lucene Team,

In general, I have advised very strongly against our team at MongoDB
modifying the Lucene source, except in scenarios where we have strong needs
for a particular customization. Ultimately, people can do what they would
like to do.

That being said, we have a number of customers preparing to use Lucene for
dense vector search. There are many language models that are optimized for
> 1024 dimensions. I remember Michael Wechner's email
 about
one instance with Open API.

I just tried to test the OpenAI model
> "text-similarity-davinci-001" with 12288 dimension


It seems that customers who attempt to use these models should not be
turned away. It could be sufficient to explain the issues. The only ones I
have identified are two expected ones in very slow indexing throughput,
high CPU usage, and a maybe less defined risk of more numerical errors.

I opened an issue  and PR
 for the discussion as well. I
would appreciate guidance on where we think the warning should go. I feel
like burying in a Javadoc is a less than ideal experience. It would be
better to be a warning on startup. In the PR, I increased the max limit by
a factor of twenty. We should let users use the system based on their needs
even if it was designed or optimized for the models they bring because we
need the feedback and the data from the world.

Is there something I'm overlooking from a risk standpoint?

Best,
-- 
Marcus Eagan