Re: Welcome Zhang Chao as Lucene committer

2024-02-21 Thread Gus Heck
Welcome :)

On Wed, Feb 21, 2024 at 12:03 PM Dawid Weiss  wrote:

>
> Congratulations and welcome!
>
> On Tue, Feb 20, 2024 at 6:28 PM Adrien Grand  wrote:
>
>> I'm pleased to announce that Zhang Chao has accepted the PMC's
>> invitation to become a committer.
>>
>> Chao, the tradition is that new committers introduce themselves with a
>> brief bio.
>>
>> Congratulations and welcome!
>>
>> --
>> Adrien
>>
>

-- 
http://www.needhamsoftware.com (work)
https://a.co/d/b2sZLD9 (my fantasy fiction book)


Re: Welcome Stefan Vodita as Lucene committter

2024-01-19 Thread Gus Heck
Welcome! Congratulations.

On Fri, Jan 19, 2024 at 12:56 PM Greg Miller  wrote:

> Welcome Stefan! Glad to have you!
>
> On Fri, Jan 19, 2024 at 08:00 Michael Sokolov  wrote:
>
>> Hello Stefan, welcome!
>>
>> On Fri, Jan 19, 2024 at 10:41 AM Martin Gainty 
>> wrote:
>>
>>> Congratulations Stefan!
>>>
>>> I look forward to reading your posts
>>>
>>> ~martin
>>> --
>>> *From:* Michael McCandless 
>>> *Sent:* Thursday, January 18, 2024 10:53 AM
>>> *To:* dev@lucene.apache.org 
>>> *Subject:* Welcome Stefan Vodita as Lucene committter
>>>
>>> Hi Team,
>>>
>>> I'm pleased to announce that Stefan Vodita has accepted the Lucene PMC's
>>> invitation to become a committer!
>>>
>>> Stefan, the tradition is that new committers introduce themselves with
>>> a brief bio.
>>>
>>> Congratulations, welcome, and thank you for all your improvements to
>>> Lucene and our community,
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>

-- 
http://www.needhamsoftware.com (work)
https://a.co/d/b2sZLD9 (my fantasy fiction book)


Re: Welcome Patrick Zhai to the Lucene PMC

2023-11-13 Thread Gus Heck
Welcome :)

On Mon, Nov 13, 2023 at 1:15 PM Anshum Gupta  wrote:

> Congratulations and welcome, Patrick!
>
> On Fri, Nov 10, 2023 at 12:05 PM Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> I'm happy to announce that Patrick Zhai has accepted an invitation to
>> join the Lucene Project Management Committee (PMC)!
>>
>> Congratulations Patrick, thank you for all your hard work improving
>> Lucene's community and source code, and welcome aboard!
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>
>
> --
> Anshum Gupta
>


-- 
http://www.needhamsoftware.com (work)
https://a.co/d/b2sZLD9 (my fantasy fiction book)


Re: Bump minimum Java version requirement to 21

2023-11-06 Thread Gus Heck
For perspective, I'm still seeing java 11 as the norm for clients... 17 is
uncommon. Anything requiring 21 is likely to be difficult to sell. I am
however a small shop, and "migrating off of solr 6" and "trying out solr
cloud" is still a thing for some clients.

Just a datapoint/anecdote, possibly skewed.

On Mon, Nov 6, 2023 at 7:41 AM Chris Hegarty
 wrote:

> Hi Robert,
>
> > On 6 Nov 2023, at 12:24, Robert Muir  wrote:
> >
> >> …
> >> The only concern I have with no.2 is that it could be considered an
> “aggressive” adoption of Java 21 - adoption sooner than the ecosystem can
> handle, e.g. are environments in which Lucene is deployed, and their
> transitive dependencies, ready to run on Java 21? By the time we’re ready
> to release 10.0.0, say March 2023, then I expect no issue with this.
> >
> > The problem is worse, historically jdk version X isn't adopted as a
> > minimum until it is already EOL. And the lucene major versions take an
> > eternity to get out there, code just sits in "main" branch for years
> > unreleased to nobody. It is really discouraging as a contributor to
> > contribute code that literally sits on the shelf for years, for no
> > good reason at all.
>
> Agreed. I also feel discouraged by this approach too, and also wanna
> avoid the “backport the world”, since it’s counterproductive.
>
> > So why delay?
> >
> > The argument of "moving sooner than ecosystem can handle" is also
> > bogus in the same way. You mean versus the code sitting on the shelf
> > and being released to nobody?
>
> Yes - sitting on the shelf is no good to anyone.
>
> Ok, what I’m hearing are good arguments for releasing 10.0.0 *now*, with
> a Java 17 minimum - this is what is in _main_ today.
>
> If we do that, then we can follow up with _main_ later (after the 10.x
> branch is created). That is, 1) bump _main_ to Java 21, and 2) decide
> when a Lucene 11 is to be released (I would to see Lucene 11 ~1yr after
> Lucene 10).
>
> This is Uwe’s proposal, earlier in this thread.
>
> -Chris.
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
http://www.needhamsoftware.com (work)
https://a.co/d/b2sZLD9 (my fantasy fiction book)


Re: Squash vs merge of PRs

2023-11-04 Thread Gus Heck
For what it's worth I basically agree with Michael Sokolov with the caveat
that I think it's sometimes useful to create a clean branch and re-pick the
changes if the merging has become complex just to make sure you are not
accidentally reverting anything vs what is on head unintentionally.
Squashing has always worried me but not severely, so I do it when requested.

Part of it hinges on commit style too. If you regularly have lots of
frequent, small commits including broken, non-compiling state, then
squashing might be a good idea.

-Gus


On Sat, Nov 4, 2023 at 11:59 AM Michael Sokolov  wrote:

> Personally for me it's about how meaningful the commit messages (and
> contents) are vs whether we use merge commits or not. If it;s a long series
> of "fixed bug" "reformatted" "did stuff" "more stuff" "it finally works"
> and so on ... that doesn't smell good to me, but you know we all have done
> that from time to time too, either by accident or because we're in a rush
> and didn't practice perfect hygiene. I guess the commit branching/linear
> purity debate is mostly a matter of taste; we can try to have some
> standards, but we should be forgiving and not try to dictate with
> automation. Honestly I didn't look at whatever Robert's commits were that
> started this discussion since it seems to have metastasized into a general
> commit history health discussion so just throwing another opinion into the
> mix here, maybe getting off topic sorry.
>
> On Sat, Nov 4, 2023 at 11:18 AM Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> I didn't realize the community had decided squashing (rewriting history)
>> was our standard.
>>
>> > Comparing histories between branches with git-bisect to find bugs is
>> just one example.
>>
>> But if the bug was introduced in one of the N local commits the developer
>> had done, wouldn't that be helpful?  You could see that one commit instead
>> of all N squashed, and get better context on how/why the bug was introduced?
>>
>> I would prefer history-preserving commits.  It can reveal/preserve
>> important information -- like we tried one approach, and discovered some
>> issue, tweaked it to a better approach.  This can be useful in the future
>> if someone is working on that part of the code and is trying to understand
>> why it was done a certain way.  It preserves the natural and healthy
>> iterations we all experience when working closely together.  Why discard
>> such possibly helpful history?
>>
>> Also, one can always wear hazy glasses in the future to "summarize" the
>> full history down to a view that's more palatable to them personally, if
>> you don't like seeing merge commit branching.  But we cannot do the
>> reverse.  Discarding the actual development history is a one-way door.
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Sat, Nov 4, 2023 at 11:03 AM Gus Heck  wrote:
>>
>>> Also, since (as noted) this is a previously decided issue, not sure why
>>> this is a list email instead of a simple direct query to Robert seeking to
>>> understand the specific case? No need to make a public discussion unless
>>> it's a long term pattern, actually breaking something, or we want to change
>>> something?
>>>
>>> On Sat, Nov 4, 2023 at 9:37 AM Benjamin Trent 
>>> wrote:
>>>
>>>> TL;DR, forcing non-committers to squash things is a good idea.
>>>> Enforcing through some measure for committers is a bad idea.
>>>>
>>>> Since this thread is now in Robert's spam, I am guessing it won't have
>>>> any impact :). I do not think Robert is actively trying hurt the project in
>>>> any way. It seems to me that he doesn't think a clean git history is worth
>>>> the effort.
>>>>
>>>> Having a clean git history makes things easier for everyone. Comparing
>>>> histories between branches with git-bisect to find bugs is just one
>>>> example. Another is simply reading commits to see when
>>>> features/bug fixes/etc. were added.
>>>>
>>>> I do NOT think we should add procedures or branch protections to
>>>> actively enforce this.
>>>>
>>>> Small personal sacrifices (like dealing with commit conflicts) are
>>>> necessary for a community. Being part of a community is about buying into
>>>> what the community is about and working towards a common goal. Many times
>>>> we do things we don't agree with, or make things slightly more difficult
>>>> for us, for the community as a who

Re: Squash vs merge of PRs

2023-11-04 Thread Gus Heck
Also, since (as noted) this is a previously decided issue, not sure why
this is a list email instead of a simple direct query to Robert seeking to
understand the specific case? No need to make a public discussion unless
it's a long term pattern, actually breaking something, or we want to change
something?

On Sat, Nov 4, 2023 at 9:37 AM Benjamin Trent  wrote:

> TL;DR, forcing non-committers to squash things is a good idea. Enforcing
> through some measure for committers is a bad idea.
>
> Since this thread is now in Robert's spam, I am guessing it won't have any
> impact :). I do not think Robert is actively trying hurt the project in any
> way. It seems to me that he doesn't think a clean git history is worth the
> effort.
>
> Having a clean git history makes things easier for everyone. Comparing
> histories between branches with git-bisect to find bugs is just one
> example. Another is simply reading commits to see when
> features/bug fixes/etc. were added.
>
> I do NOT think we should add procedures or branch protections to actively
> enforce this.
>
> Small personal sacrifices (like dealing with commit conflicts) are
> necessary for a community. Being part of a community is about buying into
> what the community is about and working towards a common goal. Many times
> we do things we don't agree with, or make things slightly more difficult
> for us, for the community as a whole. This thing being OSS shows that we
> all buy into its importance and are willing to put work into the project.
>
> Having a cultural default of "make things nice for others" is good.
> Enforcing this ideology on others is antithesis to its definition.
>
>
>
> On Sat, Nov 4, 2023 at 9:02 AM Robert Muir  wrote:
>
>> This isn't a community issue, it is me avoiding useless unnecessary
>> merge conflicts. Word "community" is invoked here to try to make it
>> out, like you can hold a vote about what git commands i should type on
>> my computer? You know that isn't gonna work. have some humility.
>>
>> thread moved to spam.
>>
>> On Sat, Nov 4, 2023 at 8:36 AM Mike Drob  wrote:
>> >
>> > We all agree on using Java though, and using a specific version, and
>> even the style output from gradle tidy. Is that nanny state or community
>> consensus?
>> >
>> > On Sat, Nov 4, 2023 at 7:29 AM Robert Muir  wrote:
>> >>
>> >> example of a nanny state IMO, trying to dictate what git commands to
>> >> use, or what editor to use. Maybe this works for you in your corporate
>> >> hellholes, but I think some folks have a bit of a power issue, are
>> >> accustomed to dictacting this stuff to their employees and so on, but
>> >> this is open-source. I don't report to you, i dont use the editor you
>> >> tell me, or the git commands you tell me.
>> >>
>> >> On Sat, Nov 4, 2023 at 8:21 AM Uwe Schindler  wrote:
>> >> >
>> >> > Hi,
>> >> >
>> >> > I just wanted to give your attention to the following discussion:
>> >> > https://github.com/apache/lucene/pull/12737#issuecomment-1793426911
>> >> >
>> >> >  From my knowledge the Lucene (and Solr) community decided a while
>> back
>> >> > to disable merging and only allow squashig of PRs. Robert always did
>> >> > this, but because of a one-time problem with two branches he was
>> working
>> >> > on in parallel, he suddenly changed his mind and did merges on his
>> own,
>> >> > not sqashing the branch and pushing to ASF Git.
>> >> >
>> >> > I am also not a fan of removing all history, but especially for heavy
>> >> > committing branches like the given PR, I think we should invite our
>> >> > committers to also adhere to community standards everyone else
>> >> > practices. I would agree with merging those branches if all commit
>> >> > messages in the branch would be well-formed with issue ID or PR
>> number,
>> >> > but in the above case you get a history of random commits which is no
>> >> > longer linear and not easy readable.
>> >> >
>> >> > What do others think?
>> >> >
>> >> > Uwe
>> >> >
>> >> > --
>> >> > Uwe Schindler
>> >> > Achterdiek 19, D-28357 Bremen
>> >> > https://www.thetaphi.de
>> >> > eMail: u...@thetaphi.de
>> >> >
>> >> >
>> >> > -
>> >> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> >> > For additional commands, e-mail: dev-h...@lucene.apache.org
>> >> >
>> >>
>> >> -
>> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> >> For additional commands, e-mail: dev-h...@lucene.apache.org
>> >>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>

-- 
http://www.needhamsoftware.com (work)
https://a.co/d/b2sZLD9 (my fantasy fiction book)


Re: Running 10.0 build with a custom lucene 9.5

2023-05-16 Thread Gus Heck
Ok pushed an attempt at a clearer message. LMK what you think.

On Tue, May 16, 2023 at 11:30 PM Gus Heck  wrote:

> Ok reading my last message I realize it still might not be clear. Here's
> what I observed:
>
> The class Codec clearly loaded, (from the lucene-core jar) when
> Codec$Holder tried to load the class initializer code went looking for the
> service definitions. It failed to find any of the the
> META-INF/services/org.apache.lucene.codec.Codec files in any of the jars
> residing in ~/.m2/repository (where the class evidently loads from),
> including the one in **the same jar that Codec had loaded from** and I
> observed it throwing Security exception by debugging at a level underneath
> the quoted method in my longer message above.
>
> So what it seems to imply is that the ability of java to load a class from
> a jar on the class path is not related to the FilePermission needed to load
> the services file.
>
> So what I want to communicate to the user is 2 things:
>
> 1. They are running a security manager, so the policy file is relevant.
> 2. The fact they are not seeing a SecurityException does not eliminate the
> possibility that they are missing permissions.
>
> We should craft a message that is clearer and communicates those two
> points.
>
> Adding a permission to the policy for the tests fixed everything (once I
> found the right policy file).
>
> -Gus
>
> On Tue, May 16, 2023 at 11:10 PM Gus Heck  wrote:
>
>> Oh hmm the google UI hid the quoted bit. If you don't like message let's
>> improve it. (actually, it should probably say the "file in the jar"... or
>> something a little more specific... not the jar entirely. The class loads,
>> but the service loader cant access the file in the same jar without the
>> FilePermission to (re?) access the jar it seems)
>>
>> On Tue, May 16, 2023 at 11:05 PM Gus Heck  wrote:
>>
>>> I propose to improve the message on an exception already thrown.
>>>
>>> On Tue, May 16, 2023 at 11:04 PM Ishan Chattopadhyaya <
>>> ichattopadhy...@gmail.com> wrote:
>>>
>>>> You propose to throw an exception containing this, right?
>>>>
>>>> > Java does not throw SecurityException if this
>>>> is the case, it just ignores the jar!
>>>>
>>>> Are you serious?
>>>>
>>>> On Wed, 17 May, 2023, 8:02 am Gus Heck,  wrote:
>>>>
>>>>> Blaming?
>>>>>
>>>>> On Tue, May 16, 2023 at 10:05 PM Ishan Chattopadhyaya <
>>>>> ichattopadhy...@gmail.com> wrote:
>>>>>
>>>>>> > Having that explicitly called out would have been SUPER helpful.
>>>>>>
>>>>>> Blaming Java in an exception thrown by Lucene is a ridiculous idea.
>>>>>>
>>>>>> On Wed, 17 May, 2023, 3:33 am Gus Heck,  wrote:
>>>>>>
>>>>>>> Found it.
>>>>>>>
>>>>>>> It's a solr thing made worse by the interaction of lucene testutils
>>>>>>> and
>>>>>>> jdk.internal.loader.URLClassPath's decision to hide anything gone
>>>>>>> wrong
>>>>>>> when checking a URL
>>>>>>> /*
>>>>>>>  * Checks whether the resource URL should be returned.
>>>>>>>  * Returns null on security check failure.
>>>>>>>  * Called by java.net.URLClassLoader.
>>>>>>>  */
>>>>>>> public static URL checkURL(URL url) {
>>>>>>> if (url != null) {
>>>>>>> try {
>>>>>>> check(url);
>>>>>>> } catch (Exception e) {
>>>>>>> return null;
>>>>>>> }
>>>>>>> }
>>>>>>> return url;
>>>>>>> }
>>>>>>>
>>>>>>> Yay. Fun. JDK classes swallowing exceptions silently.
>>>>>>>
>>>>>>> At the start of this it only took me a little while to discover that
>>>>>>> there
>>>>>>> was a security manager in play via debugging. Remembering that I saw
>>>>>>> emails
>>>>>>> about that, I went to jira, found the ticket enabling it by default
>>>>>>> in 9.x
>>>>>>> and eventually tracked down the name of the security policy f

Re: Running 10.0 build with a custom lucene 9.5

2023-05-16 Thread Gus Heck
Ok reading my last message I realize it still might not be clear. Here's
what I observed:

The class Codec clearly loaded, (from the lucene-core jar) when
Codec$Holder tried to load the class initializer code went looking for the
service definitions. It failed to find any of the the
META-INF/services/org.apache.lucene.codec.Codec files in any of the jars
residing in ~/.m2/repository (where the class evidently loads from),
including the one in **the same jar that Codec had loaded from** and I
observed it throwing Security exception by debugging at a level underneath
the quoted method in my longer message above.

So what it seems to imply is that the ability of java to load a class from
a jar on the class path is not related to the FilePermission needed to load
the services file.

So what I want to communicate to the user is 2 things:

1. They are running a security manager, so the policy file is relevant.
2. The fact they are not seeing a SecurityException does not eliminate the
possibility that they are missing permissions.

We should craft a message that is clearer and communicates those two points.

Adding a permission to the policy for the tests fixed everything (once I
found the right policy file).

-Gus

On Tue, May 16, 2023 at 11:10 PM Gus Heck  wrote:

> Oh hmm the google UI hid the quoted bit. If you don't like message let's
> improve it. (actually, it should probably say the "file in the jar"... or
> something a little more specific... not the jar entirely. The class loads,
> but the service loader cant access the file in the same jar without the
> FilePermission to (re?) access the jar it seems)
>
> On Tue, May 16, 2023 at 11:05 PM Gus Heck  wrote:
>
>> I propose to improve the message on an exception already thrown.
>>
>> On Tue, May 16, 2023 at 11:04 PM Ishan Chattopadhyaya <
>> ichattopadhy...@gmail.com> wrote:
>>
>>> You propose to throw an exception containing this, right?
>>>
>>> > Java does not throw SecurityException if this
>>> is the case, it just ignores the jar!
>>>
>>> Are you serious?
>>>
>>> On Wed, 17 May, 2023, 8:02 am Gus Heck,  wrote:
>>>
>>>> Blaming?
>>>>
>>>> On Tue, May 16, 2023 at 10:05 PM Ishan Chattopadhyaya <
>>>> ichattopadhy...@gmail.com> wrote:
>>>>
>>>>> > Having that explicitly called out would have been SUPER helpful.
>>>>>
>>>>> Blaming Java in an exception thrown by Lucene is a ridiculous idea.
>>>>>
>>>>> On Wed, 17 May, 2023, 3:33 am Gus Heck,  wrote:
>>>>>
>>>>>> Found it.
>>>>>>
>>>>>> It's a solr thing made worse by the interaction of lucene testutils
>>>>>> and
>>>>>> jdk.internal.loader.URLClassPath's decision to hide anything gone
>>>>>> wrong
>>>>>> when checking a URL
>>>>>> /*
>>>>>>  * Checks whether the resource URL should be returned.
>>>>>>  * Returns null on security check failure.
>>>>>>  * Called by java.net.URLClassLoader.
>>>>>>  */
>>>>>> public static URL checkURL(URL url) {
>>>>>> if (url != null) {
>>>>>> try {
>>>>>> check(url);
>>>>>> } catch (Exception e) {
>>>>>> return null;
>>>>>> }
>>>>>> }
>>>>>> return url;
>>>>>> }
>>>>>>
>>>>>> Yay. Fun. JDK classes swallowing exceptions silently.
>>>>>>
>>>>>> At the start of this it only took me a little while to discover that
>>>>>> there
>>>>>> was a security manager in play via debugging. Remembering that I saw
>>>>>> emails
>>>>>> about that, I went to jira, found the ticket enabling it by default
>>>>>> in 9.x
>>>>>> and eventually tracked down the name of the security policy file by
>>>>>> reading
>>>>>> solr.in.sh and /bin/solr...  The key issue that tripped me up is
>>>>>> that the
>>>>>> tests have a *separate* security policy file, and there was pretty
>>>>>> much no
>>>>>> way to know this without extensive reading of the build. Thus I got
>>>>>> thrown
>>>>>> off track when
>>>>>>
>>>>>>   permission java.io.F

Re: Running 10.0 build with a custom lucene 9.5

2023-05-16 Thread Gus Heck
Oh hmm the google UI hid the quoted bit. If you don't like message let's
improve it. (actually, it should probably say the "file in the jar"... or
something a little more specific... not the jar entirely. The class loads,
but the service loader cant access the file in the same jar without the
FilePermission to (re?) access the jar it seems)

On Tue, May 16, 2023 at 11:05 PM Gus Heck  wrote:

> I propose to improve the message on an exception already thrown.
>
> On Tue, May 16, 2023 at 11:04 PM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
>
>> You propose to throw an exception containing this, right?
>>
>> > Java does not throw SecurityException if this
>> is the case, it just ignores the jar!
>>
>> Are you serious?
>>
>> On Wed, 17 May, 2023, 8:02 am Gus Heck,  wrote:
>>
>>> Blaming?
>>>
>>> On Tue, May 16, 2023 at 10:05 PM Ishan Chattopadhyaya <
>>> ichattopadhy...@gmail.com> wrote:
>>>
>>>> > Having that explicitly called out would have been SUPER helpful.
>>>>
>>>> Blaming Java in an exception thrown by Lucene is a ridiculous idea.
>>>>
>>>> On Wed, 17 May, 2023, 3:33 am Gus Heck,  wrote:
>>>>
>>>>> Found it.
>>>>>
>>>>> It's a solr thing made worse by the interaction of lucene testutils and
>>>>> jdk.internal.loader.URLClassPath's decision to hide anything gone wrong
>>>>> when checking a URL
>>>>> /*
>>>>>  * Checks whether the resource URL should be returned.
>>>>>  * Returns null on security check failure.
>>>>>  * Called by java.net.URLClassLoader.
>>>>>  */
>>>>> public static URL checkURL(URL url) {
>>>>> if (url != null) {
>>>>> try {
>>>>> check(url);
>>>>> } catch (Exception e) {
>>>>> return null;
>>>>> }
>>>>> }
>>>>> return url;
>>>>> }
>>>>>
>>>>> Yay. Fun. JDK classes swallowing exceptions silently.
>>>>>
>>>>> At the start of this it only took me a little while to discover that
>>>>> there
>>>>> was a security manager in play via debugging. Remembering that I saw
>>>>> emails
>>>>> about that, I went to jira, found the ticket enabling it by default in
>>>>> 9.x
>>>>> and eventually tracked down the name of the security policy file by
>>>>> reading
>>>>> solr.in.sh and /bin/solr...  The key issue that tripped me up is that
>>>>> the
>>>>> tests have a *separate* security policy file, and there was pretty
>>>>> much no
>>>>> way to know this without extensive reading of the build. Thus I got
>>>>> thrown
>>>>> off track when
>>>>>
>>>>>   permission java.io.FilePermission
>>>>> "${user.home}${/}.m2${/}repository${/}-", "read";
>>>>>
>>>>> To  solr/server/etc/security.policy had no effect. That and the fact
>>>>> that
>>>>> no security exception was reported, led me to start chasing
>>>>> increasingly
>>>>> improbable hypotheses. Many hours later when I went back to debugging
>>>>> deeply into class loading, I found that the code was actually reading
>>>>> the
>>>>> jar files in question, and then I finally caught it throwing a security
>>>>> exception during my debugging.
>>>>>
>>>>> It turns out that adding the above permission to
>>>>> gradle/testing/randomization/policies/solr-tests.policy allows the
>>>>> test to
>>>>> pass. [1]
>>>>>
>>>>> I think we need to document this somewhere (or someone needs to point
>>>>> me to
>>>>> the doc I missed, FWIW I hit this basically following the process in
>>>>> dev-docs/dependency-upgrades.adoc treating lucene like a dependency,
>>>>> and
>>>>> unaware that there is a "shortcut" mode for lucene specifically in
>>>>> gradle/lucene-dev/lucene-dev-repo-composite.gradle and I find reading
>>>>> that
>>>>> file none-to clear anyway)
>>>>>
>>>>> That's the solr part, the lucene pa

Re: Running 10.0 build with a custom lucene 9.5

2023-05-16 Thread Gus Heck
I propose to improve the message on an exception already thrown.

On Tue, May 16, 2023 at 11:04 PM Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:

> You propose to throw an exception containing this, right?
>
> > Java does not throw SecurityException if this
> is the case, it just ignores the jar!
>
> Are you serious?
>
> On Wed, 17 May, 2023, 8:02 am Gus Heck,  wrote:
>
>> Blaming?
>>
>> On Tue, May 16, 2023 at 10:05 PM Ishan Chattopadhyaya <
>> ichattopadhy...@gmail.com> wrote:
>>
>>> > Having that explicitly called out would have been SUPER helpful.
>>>
>>> Blaming Java in an exception thrown by Lucene is a ridiculous idea.
>>>
>>> On Wed, 17 May, 2023, 3:33 am Gus Heck,  wrote:
>>>
>>>> Found it.
>>>>
>>>> It's a solr thing made worse by the interaction of lucene testutils and
>>>> jdk.internal.loader.URLClassPath's decision to hide anything gone wrong
>>>> when checking a URL
>>>> /*
>>>>  * Checks whether the resource URL should be returned.
>>>>  * Returns null on security check failure.
>>>>  * Called by java.net.URLClassLoader.
>>>>  */
>>>> public static URL checkURL(URL url) {
>>>> if (url != null) {
>>>> try {
>>>> check(url);
>>>> } catch (Exception e) {
>>>> return null;
>>>> }
>>>> }
>>>> return url;
>>>> }
>>>>
>>>> Yay. Fun. JDK classes swallowing exceptions silently.
>>>>
>>>> At the start of this it only took me a little while to discover that
>>>> there
>>>> was a security manager in play via debugging. Remembering that I saw
>>>> emails
>>>> about that, I went to jira, found the ticket enabling it by default in
>>>> 9.x
>>>> and eventually tracked down the name of the security policy file by
>>>> reading
>>>> solr.in.sh and /bin/solr...  The key issue that tripped me up is that
>>>> the
>>>> tests have a *separate* security policy file, and there was pretty much
>>>> no
>>>> way to know this without extensive reading of the build. Thus I got
>>>> thrown
>>>> off track when
>>>>
>>>>   permission java.io.FilePermission
>>>> "${user.home}${/}.m2${/}repository${/}-", "read";
>>>>
>>>> To  solr/server/etc/security.policy had no effect. That and the fact
>>>> that
>>>> no security exception was reported, led me to start chasing increasingly
>>>> improbable hypotheses. Many hours later when I went back to debugging
>>>> deeply into class loading, I found that the code was actually reading
>>>> the
>>>> jar files in question, and then I finally caught it throwing a security
>>>> exception during my debugging.
>>>>
>>>> It turns out that adding the above permission to
>>>> gradle/testing/randomization/policies/solr-tests.policy allows the test
>>>> to
>>>> pass. [1]
>>>>
>>>> I think we need to document this somewhere (or someone needs to point
>>>> me to
>>>> the doc I missed, FWIW I hit this basically following the process in
>>>> dev-docs/dependency-upgrades.adoc treating lucene like a dependency, and
>>>> unaware that there is a "shortcut" mode for lucene specifically in
>>>> gradle/lucene-dev/lucene-dev-repo-composite.gradle and I find reading
>>>> that
>>>> file none-to clear anyway)
>>>>
>>>> That's the solr part, the lucene part is that the security exception is
>>>> hit
>>>> when in org.apache.lucene.codecs.Codec$Holder.(Codec.java:58)
>>>> when org.apache.lucene.tests.util.TestRuleSetupAndRestoreClassEnv#before
>>>> does
>>>>
>>>>  savedCodec = Codec.getDefault();
>>>>
>>>> The error message "An SPI class of type org.apache.lucene.codecs.Codec
>>>> with
>>>> name 'Lucene95' does not exist." was moderately misleading because the
>>>> file
>>>> and the services files in the jar definitely did exist. This message
>>>> should
>>>> vary if there is an installed security manager, maybe saying something
>>>> like:
>&

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Gus Heck
Hi Robert,

If you read the issue I opened more carefully you'll see I had all the
service loading stuff sorted just fine. It's the silent eating of the
security exceptions by URLClassPath that I think is a useful thing to point
out. If anything, that ticket is more about being surprised by Security
manager behavior than service loading. I thought it would be good if anyone
else who doesn't know that bit of (IMHO obscure) trivia didn't have to
spend a long time hunting down the same thing I did if they encounter a
misconfigured security policy. If you think it could be worded better I'm
all ears.

Also, I didn't question a single thing you said or ask you to repeat
anything. I asked for a very specific detail you have not yet provided, and
that's what's your goal post. When is it good enough?

I do disagree with you at a "how open source works best" level, favoring
enablement. I don't think I've disagreed with a single one of your
technical claims or reported experiences.

Awesome if you've found an improvement. :) If it works as well as you
expect, is it enough to change your mind?

-Gus

On Tue, May 16, 2023 at 8:54 PM Robert Muir  wrote:

> Gus, I think i explained myself multiple times on issues and in this
> thread. the performance is unacceptable, everyone knows it, but nobody is
> talking about.
> I don't need to explain myself time and time again here.
> You don't seem to understand the technical issues (at least you sure as
> fuck don't know how service loading works or you wouldnt have opened
> https://github.com/apache/lucene/issues/12300 )
>
> I'm just the only one here completely unconstrained by any of silicon
> valley's influences to speak my true mind, without any repercussions, so I
> do it. Don't give any fucks about ChatGPT.
>
> I'm standing by my technical veto. If you bypass it, I'll revert the
> offending commit.
>
> As far as fixing the technical performance, I just opened an issue with
> some ideas to at least improve cpu usage by a factor of N. It does not help
> with the crazy heap memory usage or other issues of KNN implementation
> causing shit like OOM on merge. But it is one step:
> https://github.com/apache/lucene/issues/12302
>
>
>
> On Tue, May 16, 2023 at 7:45 AM Gus Heck  wrote:
>
>> Robert,
>>
>> Can you explain in clear technical terms the standard that must be met
>> for performance? A benchmark that must run in X time on Y hardware for
>> example (and why that test is suitable)? Or some other reproducible
>> criteria? So far I've heard you give an *opinion* that it's unusable, but
>> that's not a technical criteria, others may have a different concept of
>> what is usable to them.
>>
>> Forgive me if I misunderstand, but the essence of your argument has
>> seemed to be
>>
>> "Performance isn't good enough, therefore we should force anyone who
>> wants to experiment with something bigger to fork the code base to do it"
>>
>> Thus, it is necessary to have a clear unambiguous standard that anyone
>> can verify for "good enough". A clear standard would also focus efforts at
>> improvement.
>>
>> Where are the goal posts?
>>
>> FWIW I'm +1 on any of 2-4 since I believe the existence of a hard limit
>> is fundamentally counterproductive in an open source setting, as it will
>> lead to *fewer people* pushing the limits. Extremely few people are
>> going to get into the nitty-gritty of optimizing things unless they are
>> staring at code that they can prove does something interesting, but doesn't
>> run fast enough for their purposes. If people hit a hard limit, more of
>> them give up and never develop the code that will motivate them to look for
>> optimizations.
>>
>> -Gus
>>
>> On Tue, May 16, 2023 at 6:04 AM Robert Muir  wrote:
>>
>>> i still feel -1 (veto) on increasing this limit. sending more emails
>>> does not change the technical facts or make the veto go away.
>>>
>>> On Tue, May 16, 2023 at 4:50 AM Alessandro Benedetti <
>>> a.benede...@sease.io> wrote:
>>>
>>>> Hi all,
>>>> we have finalized all the options proposed by the community and we are
>>>> ready to vote for the preferred one and then proceed with the
>>>> implementation.
>>>>
>>>> *Option 1*
>>>> Keep it as it is (dimension limit hardcoded to 1024)
>>>> *Motivation*:
>>>> We are close to improving on many fronts. Given the criticality of
>>>> Lucene in computing infrastructure and the concerns raised by one of the
>>>> most active stewards of the project, I think we should keep working toward
>&g

Re: Running 10.0 build with a custom lucene 9.5

2023-05-16 Thread Gus Heck
Blaming?

On Tue, May 16, 2023 at 10:05 PM Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:

> > Having that explicitly called out would have been SUPER helpful.
>
> Blaming Java in an exception thrown by Lucene is a ridiculous idea.
>
> On Wed, 17 May, 2023, 3:33 am Gus Heck,  wrote:
>
>> Found it.
>>
>> It's a solr thing made worse by the interaction of lucene testutils and
>> jdk.internal.loader.URLClassPath's decision to hide anything gone wrong
>> when checking a URL
>> /*
>>  * Checks whether the resource URL should be returned.
>>  * Returns null on security check failure.
>>  * Called by java.net.URLClassLoader.
>>  */
>> public static URL checkURL(URL url) {
>> if (url != null) {
>> try {
>> check(url);
>> } catch (Exception e) {
>> return null;
>> }
>> }
>> return url;
>> }
>>
>> Yay. Fun. JDK classes swallowing exceptions silently.
>>
>> At the start of this it only took me a little while to discover that there
>> was a security manager in play via debugging. Remembering that I saw
>> emails
>> about that, I went to jira, found the ticket enabling it by default in 9.x
>> and eventually tracked down the name of the security policy file by
>> reading
>> solr.in.sh and /bin/solr...  The key issue that tripped me up is that the
>> tests have a *separate* security policy file, and there was pretty much no
>> way to know this without extensive reading of the build. Thus I got thrown
>> off track when
>>
>>   permission java.io.FilePermission
>> "${user.home}${/}.m2${/}repository${/}-", "read";
>>
>> To  solr/server/etc/security.policy had no effect. That and the fact that
>> no security exception was reported, led me to start chasing increasingly
>> improbable hypotheses. Many hours later when I went back to debugging
>> deeply into class loading, I found that the code was actually reading the
>> jar files in question, and then I finally caught it throwing a security
>> exception during my debugging.
>>
>> It turns out that adding the above permission to
>> gradle/testing/randomization/policies/solr-tests.policy allows the test to
>> pass. [1]
>>
>> I think we need to document this somewhere (or someone needs to point me
>> to
>> the doc I missed, FWIW I hit this basically following the process in
>> dev-docs/dependency-upgrades.adoc treating lucene like a dependency, and
>> unaware that there is a "shortcut" mode for lucene specifically in
>> gradle/lucene-dev/lucene-dev-repo-composite.gradle and I find reading that
>> file none-to clear anyway)
>>
>> That's the solr part, the lucene part is that the security exception is
>> hit
>> when in org.apache.lucene.codecs.Codec$Holder.(Codec.java:58)
>> when org.apache.lucene.tests.util.TestRuleSetupAndRestoreClassEnv#before
>> does
>>
>>  savedCodec = Codec.getDefault();
>>
>> The error message "An SPI class of type org.apache.lucene.codecs.Codec
>> with
>> name 'Lucene95' does not exist." was moderately misleading because the
>> file
>> and the services files in the jar definitely did exist. This message
>> should
>> vary if there is an installed security manager, maybe saying something
>> like:
>>
>> "An SPI class of type org.apache.lucene.codecs.Codec with name 'Lucene95'
>> does not exist. We have detected that a security manager is installed so
>> it
>> is also possible that the jar containing the codec is inaccessible under
>> the current security policy. (Java does not throw SecurityException if
>> this
>> is the case, it just ignores the jar!)" [2]
>>
>> Having that explicitly called out would have been SUPER helpful.
>>
>> -Gus
>>
>> [1]: https://issues.apache.org/jira/browse/SOLR-16804
>> [2]: https://github.com/apache/lucene/issues/12300
>>
>>
>> On Mon, May 15, 2023 at 3:17 PM Michael Sokolov 
>> wrote:
>>
>> > random guess - does it have something to do with modules?
>> >
>> > On Mon, May 15, 2023 at 11:14 AM Gus Heck  wrote:
>> > >
>> > > I hadn't seen that one. Thanks, I'll look at it. It already looks a
>> bit
>> > confusing though since it seems to have options for pointing to a repo,
>> but
>> > I appear to be pulling the jars successfully from .m2/repository
>> already...
>> > 

Re: Running 10.0 build with a custom lucene 9.5

2023-05-16 Thread Gus Heck
Found it.

It's a solr thing made worse by the interaction of lucene testutils and
jdk.internal.loader.URLClassPath's decision to hide anything gone wrong
when checking a URL
/*
 * Checks whether the resource URL should be returned.
 * Returns null on security check failure.
 * Called by java.net.URLClassLoader.
 */
public static URL checkURL(URL url) {
if (url != null) {
try {
check(url);
} catch (Exception e) {
return null;
}
}
return url;
}

Yay. Fun. JDK classes swallowing exceptions silently.

At the start of this it only took me a little while to discover that there
was a security manager in play via debugging. Remembering that I saw emails
about that, I went to jira, found the ticket enabling it by default in 9.x
and eventually tracked down the name of the security policy file by reading
solr.in.sh and /bin/solr...  The key issue that tripped me up is that the
tests have a *separate* security policy file, and there was pretty much no
way to know this without extensive reading of the build. Thus I got thrown
off track when

  permission java.io.FilePermission
"${user.home}${/}.m2${/}repository${/}-", "read";

To  solr/server/etc/security.policy had no effect. That and the fact that
no security exception was reported, led me to start chasing increasingly
improbable hypotheses. Many hours later when I went back to debugging
deeply into class loading, I found that the code was actually reading the
jar files in question, and then I finally caught it throwing a security
exception during my debugging.

It turns out that adding the above permission to
gradle/testing/randomization/policies/solr-tests.policy allows the test to
pass. [1]

I think we need to document this somewhere (or someone needs to point me to
the doc I missed, FWIW I hit this basically following the process in
dev-docs/dependency-upgrades.adoc treating lucene like a dependency, and
unaware that there is a "shortcut" mode for lucene specifically in
gradle/lucene-dev/lucene-dev-repo-composite.gradle and I find reading that
file none-to clear anyway)

That's the solr part, the lucene part is that the security exception is hit
when in org.apache.lucene.codecs.Codec$Holder.(Codec.java:58)
when org.apache.lucene.tests.util.TestRuleSetupAndRestoreClassEnv#before
does

 savedCodec = Codec.getDefault();

The error message "An SPI class of type org.apache.lucene.codecs.Codec with
name 'Lucene95' does not exist." was moderately misleading because the file
and the services files in the jar definitely did exist. This message should
vary if there is an installed security manager, maybe saying something like:

"An SPI class of type org.apache.lucene.codecs.Codec with name 'Lucene95'
does not exist. We have detected that a security manager is installed so it
is also possible that the jar containing the codec is inaccessible under
the current security policy. (Java does not throw SecurityException if this
is the case, it just ignores the jar!)" [2]

Having that explicitly called out would have been SUPER helpful.

-Gus

[1]: https://issues.apache.org/jira/browse/SOLR-16804
[2]: https://github.com/apache/lucene/issues/12300


On Mon, May 15, 2023 at 3:17 PM Michael Sokolov  wrote:

> random guess - does it have something to do with modules?
>
> On Mon, May 15, 2023 at 11:14 AM Gus Heck  wrote:
> >
> > I hadn't seen that one. Thanks, I'll look at it. It already looks a bit
> confusing though since it seems to have options for pointing to a repo, but
> I appear to be pulling the jars successfully from .m2/repository already...
> (except then they don't work, so successful means I see them in the
> classpath of the relevant classloader). And if we can't deploy a valid jar
> to mavenLocal for some reason (tweaked the solr build so it sees
> mavenLocal()), (or solr can't consume such a jar) that seems like an issue
> for whichever one is breaking that.
> >
> > Debugging: The JDK appears to be attempting to load the services file
> from modules, but not seeing the lucene module. (just the jdk ones) Also it
> passes through a block that says:
> >
> > // not in a package of a module defined to this loader
> > for (URL url : findMiscResource(name)) {
> >
> > (but then iterates
> jdk.internal.loader.BuiltinClassLoader#nameToModule.values() to load things
> anyway)
> >
> > -Gus
> >
> > On Mon, May 15, 2023 at 10:54 AM Houston Putman 
> wrote:
> >>
> >> Gus, I haven't done this myself, but are you using the instructions
> provided in Solr's "gradle/lucene-dev/lucene-dev-repo-composite.gradle"?
> >>
> >> It looks like you need to specify the development lucene version
> differently than other dep

Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Gus Heck
Actually, I had wondered if this is a proper vote thread or not, normally
those are yes/no on a single option.

On Tue, May 16, 2023 at 10:47 AM Alessandro Benedetti 
wrote:

> Hi Marcus,
> I am afraid at this stage Robert's opinion counts just as any other
> opinion, a single vote for option 1.
> We are collecting a community's feedback here, we are not changing any
> code nor voting for a yes/no.
> Once the voting is finished, we'll operate an action depending on the
> community's choice.
> If the action involves making a change and someone(Robert or whoever)
> feels to veto it, he/she will need to motivate the veto with technical
> merit.
>
> In response to Uwe point:
>
>>
>>> On Tue, May 16, 2023 at 9:57 AM Uwe Schindler  wrote:
>>>
 I agree with Dawid,

 I am +1 for those two options in combination:

- option 3 (make limit an HNSW specific thing). New formats may use
other limits (lower or higher).
- option 4 (make a system property with HNSW prefix). Adding the
system property must be done in same way like new properties for MMAP
directory (including access controller) so it can be denied by system 
 admin
to be set in code (see

 https://github.com/apache/lucene/blob/f53eb28af053d7612f7e4d1b2de05d33dc410645/lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java#L327-L346
for example). Care has to be taken that the static initializers won't 
 fail
is system properties cannot be read/set (system adminitrator enforces
default -> see mmap code). It also has to be made sure that an index
written with raised limit can still be read without the limit, so the 
 limit
should not be glued into the file format. Otherwise I disagree with 
 option
4.

 In short: I am fine with making it configurable only for HNSW if the
 limit is not glued into index format. The default should only be there to
 by default prevent people from doing wrong things, but changing default
 should not break reading/modifiying those indexes.

 Uwe

 Thanks Uwe, that's very useful!
> Just to fully understand it, right now the limit is not written in any
> file format, so you just want this behavior to be maintained right?
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: [VOTE] Dimension Limit for KNN Vectors

2023-05-16 Thread Gus Heck
Robert,

Can you explain in clear technical terms the standard that must be met for
performance? A benchmark that must run in X time on Y hardware for example
(and why that test is suitable)? Or some other reproducible criteria? So
far I've heard you give an *opinion* that it's unusable, but that's not a
technical criteria, others may have a different concept of what is usable
to them.

Forgive me if I misunderstand, but the essence of your argument has seemed
to be

"Performance isn't good enough, therefore we should force anyone who wants
to experiment with something bigger to fork the code base to do it"

Thus, it is necessary to have a clear unambiguous standard that anyone can
verify for "good enough". A clear standard would also focus efforts at
improvement.

Where are the goal posts?

FWIW I'm +1 on any of 2-4 since I believe the existence of a hard limit is
fundamentally counterproductive in an open source setting, as it will lead
to *fewer people* pushing the limits. Extremely few people are going to get
into the nitty-gritty of optimizing things unless they are staring at code
that they can prove does something interesting, but doesn't run fast enough
for their purposes. If people hit a hard limit, more of them give up and
never develop the code that will motivate them to look for optimizations.

-Gus

On Tue, May 16, 2023 at 6:04 AM Robert Muir  wrote:

> i still feel -1 (veto) on increasing this limit. sending more emails does
> not change the technical facts or make the veto go away.
>
> On Tue, May 16, 2023 at 4:50 AM Alessandro Benedetti 
> wrote:
>
>> Hi all,
>> we have finalized all the options proposed by the community and we are
>> ready to vote for the preferred one and then proceed with the
>> implementation.
>>
>> *Option 1*
>> Keep it as it is (dimension limit hardcoded to 1024)
>> *Motivation*:
>> We are close to improving on many fronts. Given the criticality of Lucene
>> in computing infrastructure and the concerns raised by one of the most
>> active stewards of the project, I think we should keep working toward
>> improving the feature as is and move to up the limit after we can
>> demonstrate improvement unambiguously.
>>
>> *Option 2*
>> make the limit configurable, for example through a system property
>> *Motivation*:
>> The system administrator can enforce a limit its users need to respect
>> that it's in line with whatever the admin decided to be acceptable for
>> them.
>> The default can stay the current one.
>> This should open the doors for Apache Solr, Elasticsearch, OpenSearch,
>> and any sort of plugin development
>>
>> *Option 3*
>> Move the max dimension limit lower level to a HNSW specific
>> implementation. Once there, this limit would not bind any other potential
>> vector engine alternative/evolution.
>> *Motivation:* There seem to be contradictory performance interpretations
>> about the current HNSW implementation. Some consider its performance ok,
>> some not, and it depends on the target data set and use case. Increasing
>> the max dimension limit where it is currently (in top level
>> FloatVectorValues) would not allow potential alternatives (e.g. for other
>> use-cases) to be based on a lower limit.
>>
>> *Option 4*
>> Make it configurable and move it to an appropriate place.
>> In particular, a simple Integer.getInteger("lucene.hnsw.maxDimensions",
>> 1024) should be enough.
>> *Motivation*:
>> Both are good and not mutually exclusive and could happen in any order.
>> Someone suggested to perfect what the _default_ limit should be, but I've
>> not seen an argument _against_ configurability.  Especially in this way --
>> a toggle that doesn't bind Lucene's APIs in any way.
>>
>> I'll keep this [VOTE] open for a week and then proceed to the
>> implementation.
>> --
>> *Alessandro Benedetti*
>> Director @ Sease Ltd.
>> *Apache Lucene/Solr Committer*
>> *Apache Solr PMC Member*
>>
>> e-mail: a.benede...@sease.io
>>
>>
>> *Sease* - Information Retrieval Applied
>> Consulting | Training | Open Source
>>
>> Website: Sease.io 
>> LinkedIn  | Twitter
>>  | Youtube
>>  | Github
>> 
>>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Running 10.0 build with a custom lucene 9.5

2023-05-15 Thread Gus Heck
I hadn't seen that one. Thanks, I'll look at it. It already looks a bit
confusing though since it seems to have options for pointing to a repo, but
I appear to be pulling the jars successfully from .m2/repository already...
(except then they don't work, so successful means I see them in the
classpath of the relevant classloader). And if we can't deploy a valid jar
to mavenLocal for some reason (tweaked the solr build so it sees
mavenLocal()), (or solr can't consume such a jar) that seems like an issue
for whichever one is breaking that.

Debugging: The JDK appears to be attempting to load the services file from
modules, but not seeing the lucene module. (just the jdk ones) Also it
passes through a block that says:

// not in a package of a module defined to this loader
for (URL url : findMiscResource(name)) {

(but then
iterates jdk.internal.loader.BuiltinClassLoader#nameToModule.values() to
load things anyway)

-Gus

On Mon, May 15, 2023 at 10:54 AM Houston Putman 
wrote:

> Gus, I haven't done this myself, but are you using the instructions
> provided in Solr's "gradle/lucene-dev/lucene-dev-repo-composite.gradle"?
>
> It looks like you need to specify the development lucene version
> differently than other dependencies...
>
> - Houston
>
> On Sat, May 13, 2023 at 10:14 AM Michael Sokolov 
> wrote:
>
>> doh I actually read your email and you said you already checked that -
>> I'm going to send out one of those "sokolov would like to retract the
>> previous email" emails. Does GMail even pretend to do that? I don't
>> know what's going on there! sorry
>>
>> On Sat, May 13, 2023 at 10:13 AM Michael Sokolov 
>> wrote:
>> >
>> > sorry - META-INF not WEB-INF
>> >
>> > On Sat, May 13, 2023 at 10:12 AM Michael Sokolov 
>> wrote:
>> > >
>> > > You are probably missing the contents of WEB-INF in your custom jar?
>> > > Roughly speaking the files in there define run-time-bound "services"
>> > > that are looked up by name by the JDK's service-loader API.
>> > >
>> > > On Sat, May 13, 2023 at 9:33 AM Gus Heck  wrote:
>> > > >
>> > > > Cross posting to lucene on the possibility that folks here are more
>> likely to add customized lucene to Solr and recognize what I'm stumbling
>> on? (zero responses on solr list)
>> > > >
>> > > > Note that the specific test that I happened to copy is not the
>> issue, all tests are doing this (or at least so many tests are failing I
>> can't see the ones that are passing easily).
>> > > >
>> > > > -- Forwarded message -
>> > > > From: Gus Heck 
>> > > > Date: Wed, May 10, 2023 at 6:50 PM
>> > > > Subject: Running 10.0 build with a custom lucene 9.5
>> > > > To: 
>> > > >
>> > > >
>> > > > Lucene:
>> > > >
>> > > > I made a tweak to lucene for something I'm investigating, gave it a
>> new version, deployed to mavenLocal()
>> > > > I have verified that the jars are built with correct
>> META-INF/services files
>> > > >
>> > > > Solr:
>> > > >
>> > > > I added mavenLocal() in gradle/globals.gradle
>> > > > I removed the license file sha1 sigs for the default lucene &
>> creates signatures for my test version
>> > > > I updated versions.props
>> > > > I updated versions.lock
>> > > >
>> > > > Now when I run individual solr tests via my ide they seem to pass,
>> but virtually every test run via gradle fails with something like:
>> > > >
>> > > > org.apache.solr.embedded.TestJettySolrRunner > classMethod FAILED
>> > > > java.lang.ExceptionInInitializerError
>> > > > at org.apache.lucene.codecs.Codec.getDefault(Codec.java:141)
>> > > > at
>> org.apache.lucene.tests.util.TestRuleSetupAndRestoreClassEnv.before(TestRuleSetupAndRestoreClassEnv.java:137)
>> > > > at
>> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:42)
>> > > > at
>> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>> > > > at
>> org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
>> > > > at
>> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverrides

Fwd: Running 10.0 build with a custom lucene 9.5

2023-05-13 Thread Gus Heck
Cross posting to lucene on the possibility that folks here are more likely
to add customized lucene to Solr and recognize what I'm stumbling on? (zero
responses on solr list)

Note that the specific test that I happened to copy is not the issue, all
tests are doing this (or at least so many tests are failing I can't see the
ones that are passing easily).

-- Forwarded message -
From: Gus Heck 
Date: Wed, May 10, 2023 at 6:50 PM
Subject: Running 10.0 build with a custom lucene 9.5
To: 


Lucene:

   - I made a tweak to lucene for something I'm investigating, gave it a
   new version, deployed to mavenLocal()
   - I have verified that the jars are built with correct META-INF/services
   files

Solr:

   - I added mavenLocal() in gradle/globals.gradle
   - I removed the license file sha1 sigs for the default lucene & creates
   signatures for my test version
   - I updated versions.props
   - I updated versions.lock

Now when I run individual solr tests via my ide they seem to pass, but
virtually every test run via gradle fails with something like:

org.apache.solr.embedded.TestJettySolrRunner > classMethod FAILED
java.lang.ExceptionInInitializerError
at org.apache.lucene.codecs.Codec.getDefault(Codec.java:141)
at
org.apache.lucene.tests.util.TestRuleSetupAndRestoreClassEnv.before(TestRuleSetupAndRestoreClassEnv.java:137)
at
org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:42)
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at
org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
at
org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
at
org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
at
org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
at
com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
at java.base/java.lang.Thread.run(Thread.java:829)

Caused by:
java.lang.IllegalArgumentException: An SPI class of type
org.apache.lucene.codecs.Codec with name 'Lucene95' does not exist.  You
need to add the corresponding JAR file supporting this SPI to your
classpath.  The current classpath supports the following names: []
at
org.apache.lucene.util.NamedSPILoader.lookup(NamedSPILoader.java:113)
at org.apache.lucene.codecs.Codec$Holder.(Codec.java:58)
... 19 more

org.apache.solr.embedded.TestJettySolrRunner > classMethod FAILED
java.lang.NullPointerException
at java.base/java.util.Objects.requireNonNull(Objects.java:221)
at org.apache.lucene.codecs.Codec.setDefault(Codec.java:151)
at
org.apache.lucene.tests.util.TestRuleSetupAndRestoreClassEnv.after(TestRuleSetupAndRestoreClassEnv.java:292)
at
org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:49)
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequir

Re: Conneting Lucene with ChatGPT Retrieval Plugin

2023-05-10 Thread Gus Heck
Do you anticipate that the vector engine would be changed in a way that
fundamentally precluded larger vectors (intentionally)? I would think that
the ability to support larger vectors should be a key criteria for any
changes to be made. Certainly if there are optimizations to be had at
specific sizes (due to power of 2 size or some other numerical coincidence)
found in the future we should have ways of picking that up if people use
the beneficial size, but I don't understand the idea that we would support
a change to the engine that would preclude larger vectors in the long run.
It makes great sense to have a default limit because it's important to
communicate that "beyond this point we haven't tested, we don't know what
happens and you are on your own" but forcing a code fork for folks to do
that testing only creates a barrier if they find something useful that they
want to contribute back...

On the proposal's thread I like the configurability option fwiw.

On Tue, May 9, 2023 at 12:49 PM Bruno Roustant 
wrote:

> I agree with Robert Muir that an increase of the 1024 limit as it is
> currently in FloatVectorValues or ByteVectorValues would bind the API, we
> could not decrease it after, even if we needed to change the vector engine.
>
> Would it be possible to move the limit definition to a HNSW specific
> implementation, where it would only bind HNSW?
> I don't know this area of code well. It seems to me the FloatVectorValues
> implementation is unfortunately not HNSW specific. Is this on purpose? We
> should be able to replace the vector engine, no?
>
> Le sam. 6 mai 2023 à 22:44, Michael Wechner  a
> écrit :
>
>> there is already a pull request for Elasticsearch which is also
>> mentioning the max size 1024
>>
>> https://github.com/openai/chatgpt-retrieval-plugin/pull/83
>>
>>
>>
>> Am 06.05.23 um 19:00 schrieb Michael Wechner:
>> > Hi Together
>> >
>> > I recently setup ChatGPT retrieval plugin locally
>> >
>> > https://github.com/openai/chatgpt-retrieval-plugin
>> >
>> > I think it would be nice to consider to submit a Lucene implementation
>> > for this plugin
>> >
>> > https://github.com/openai/chatgpt-retrieval-plugin#future-directions
>> >
>> > The plugin is using by default OpenAI's model "text-embedding-ada-002"
>> > with 1536 dimensions
>> >
>> > https://openai.com/blog/new-and-improved-embedding-model
>> >
>> > but which means one won't be able to use it out-of-the-box with Lucene.
>> >
>> > Similar request here
>> >
>> >
>> https://learn.microsoft.com/en-us/answers/questions/1192796/open-ai-text-embedding-dimensions
>> >
>> >
>> > I understand we just recently had a lenghty discussion about
>> > increasing the max dimension and whatever one thinks of OpenAI, fact
>> > is, that it has a huge impact and I think it would be nice that Lucene
>> > could be part of this "revolution". All we have to do is increase the
>> > limit from 1024 to 1536 or even 2048 for example.
>> >
>> > Since the performace seems to be linear with the vector dimension and
>> > several members have done performance tests successfully and 1024
>> > seems to have been chosen as max dimension quite arbitrarily in the
>> > first place, I think it should not be a problem to increase the max
>> > dimension by a factor 1.5 or 2.
>> >
>> > WDYT?
>> >
>> > Thanks
>> >
>> > Michael
>> >
>> >
>> >
>> > -
>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> > For additional commands, e-mail: dev-h...@lucene.apache.org
>> >
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Why two TermAndBoost classes

2023-05-02 Thread Gus Heck
Was fishing around in parsers in solr and discovered that we have two
different term and boost classes in Lucene. Is this really desirable? They
are quite similar except one implements a notion of equality, and doesn't
copy the BytesRef when created whereas the other relies on object equality
and does copy the BytesRef in the constructor.

The difference in copying BytesRef seems suspicious, and I wonder if
there's a good reason not to have a different notion of equality among the
two. Also one is public and the other is private to SynonymQuery but both
are static and don't seem to leverage their privileges of being within the
containing class, so maybe they don't need to be inner classes?

-Gus

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-11 Thread Gus Heck
His point is that we, as a dev community, are not paying enough attention
> to the indexing performance of our KNN algo (HNSW) and implementation, and
> that it is reckless to increase / remove limits in that state.
>

If the argument were... "Please hold off while I'm actively improving this,
it will be ready soon and then we can adjust the limit" that might have
technical merit. As it was presented it came across more like "I'm going to
hold this feature lots of folk want hostage until *someone else* does
something I think should be done"... I doubt that was actually what he
consciously thought (I don't think anyone on this project would have that
specific intention), but the context and manner have made it seem that way,
and the net effect seems to be trending in that direction.

If there's a way that raising the limit *prevents* working on performance
that of course would be a key thing to understand. It seems to me that the
exact person who's going to go on a performance crusade is the person who
has a technique that they can prove works, but it's just too darn slow
Maybe not the first person, maybe not the fifth, but it's going to be
*someone* who needs it...

100% the user should know that they are "off the edge of the map" and "here
there be monsters." Document it well, issue a warning, whatever. Once
they've been told, and they set sail for the unknown, let them develop an
itch so that they can scratch it.


Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-09 Thread Gus Heck
Also technically, it's just the threat of a veto since we are not actually
in a vote thread

On Sun, Apr 9, 2023 at 12:46 PM Gus Heck  wrote:

> What I see so far:
>
>1. Much positive support for raising the limit
>2. Slightly less support for removing it or making it configurable
>3. A single veto which argues that a (as yet undefined) performance
>standard must be met before raising the limit
>4. Hot tempers (various) making this discussion difficult
>
> As I understand it, vetoes must have technical merit. I'm not sure that
> this veto rises to "technical merit" on 2 counts:
>
>1. No standard for the performance is given so it cannot be
>technically met. Without hard criteria it's a moving target.
>2. It appears to encode a valuation of the user's time, and that
>valuation is really up to the user. Some users may consider 2hours useless
>and not worth it, and others might happily wait 2 hours. This is not a
>technical decision, it's a business decision regarding the relative value
>of the time invested vs the value of the result. If I can cure cancer by
>indexing for a year, that might be worth it... (hyperbole of course).
>
> Things I would consider to have technical merit that I don't hear:
>
>1. Impact on the speed of **other** indexing operations. (devaluation
>of other functionality)
>2. Actual scenarios that work when the limit is low and fail when the
>limit is high (new failure on the same data with the limit raised).
>
> One thing that might or might not have technical merit
>
>1. If someone feels there is a lack of documentation of the
>costs/performance implications of using large vectors, possibly including
>reproducible benchmarks establishing the scaling behavior (there seems to
>be disagreement on O(n) vs O(n^2)).
>
> The users *should* know what they are getting into, but if the cost is
> worth it to them, they should be able to pay it without forking the
> project. If this veto causes a fork that's not good.
>
> On Sun, Apr 9, 2023 at 7:55 AM Michael Sokolov  wrote:
>
>> We do have a dataset built from Wikipedia in luceneutil. It comes in 100
>> and 300 dimensional varieties and can easily enough generate large numbers
>> of vector documents from the articles data. To go higher we could
>> concatenate vectors from that and I believe the performance numbers would
>> be plausible.
>>
>> On Sun, Apr 9, 2023, 1:32 AM Dawid Weiss  wrote:
>>
>>> Can we set up a branch in which the limit is bumped to 2048, then have
>>> a realistic, free data set (wikipedia sample or something) that has,
>>> say, 5 million docs and vectors created using public data (glove
>>> pre-trained embeddings or the like)? We then could run indexing on the
>>> same hardware with 512, 1024 and 2048 and see what the numbers, limits
>>> and behavior actually are.
>>>
>>> I can help in writing this but not until after Easter.
>>>
>>>
>>> Dawid
>>>
>>> On Sat, Apr 8, 2023 at 11:29 PM Adrien Grand  wrote:
>>> >
>>> > As Dawid pointed out earlier on this thread, this is the rule for
>>> > Apache projects: a single -1 vote on a code change is a veto and
>>> > cannot be overridden. Furthermore, Robert is one of the people on this
>>> > project who worked the most on debugging subtle bugs, making Lucene
>>> > more robust and improving our test framework, so I'm listening when he
>>> > voices quality concerns.
>>> >
>>> > The argument against removing/raising the limit that resonates with me
>>> > the most is that it is a one-way door. As MikeS highlighted earlier on
>>> > this thread, implementations may want to take advantage of the fact
>>> > that there is a limit at some point too. This is why I don't want to
>>> > remove the limit and would prefer a slight increase, such as 2048 as
>>> > suggested in the original issue, which would enable most of the things
>>> > that users who have been asking about raising the limit would like to
>>> > do.
>>> >
>>> > I agree that the merge-time memory usage and slow indexing rate are
>>> > not great. But it's still possible to index multi-million vector
>>> > datasets with a 4GB heap without hitting OOMEs regardless of the
>>> > number of dimensions, and the feedback I'm seeing is that many users
>>> > are still interested in indexing multi-million vector datasets despite
>>> > the slow indexing rate. I wish we could do better, and vector inde

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-09 Thread Gus Heck
What I see so far:

   1. Much positive support for raising the limit
   2. Slightly less support for removing it or making it configurable
   3. A single veto which argues that a (as yet undefined) performance
   standard must be met before raising the limit
   4. Hot tempers (various) making this discussion difficult

As I understand it, vetoes must have technical merit. I'm not sure that
this veto rises to "technical merit" on 2 counts:

   1. No standard for the performance is given so it cannot be technically
   met. Without hard criteria it's a moving target.
   2. It appears to encode a valuation of the user's time, and that
   valuation is really up to the user. Some users may consider 2hours useless
   and not worth it, and others might happily wait 2 hours. This is not a
   technical decision, it's a business decision regarding the relative value
   of the time invested vs the value of the result. If I can cure cancer by
   indexing for a year, that might be worth it... (hyperbole of course).

Things I would consider to have technical merit that I don't hear:

   1. Impact on the speed of **other** indexing operations. (devaluation of
   other functionality)
   2. Actual scenarios that work when the limit is low and fail when the
   limit is high (new failure on the same data with the limit raised).

One thing that might or might not have technical merit

   1. If someone feels there is a lack of documentation of the
   costs/performance implications of using large vectors, possibly including
   reproducible benchmarks establishing the scaling behavior (there seems to
   be disagreement on O(n) vs O(n^2)).

The users *should* know what they are getting into, but if the cost is
worth it to them, they should be able to pay it without forking the
project. If this veto causes a fork that's not good.

On Sun, Apr 9, 2023 at 7:55 AM Michael Sokolov  wrote:

> We do have a dataset built from Wikipedia in luceneutil. It comes in 100
> and 300 dimensional varieties and can easily enough generate large numbers
> of vector documents from the articles data. To go higher we could
> concatenate vectors from that and I believe the performance numbers would
> be plausible.
>
> On Sun, Apr 9, 2023, 1:32 AM Dawid Weiss  wrote:
>
>> Can we set up a branch in which the limit is bumped to 2048, then have
>> a realistic, free data set (wikipedia sample or something) that has,
>> say, 5 million docs and vectors created using public data (glove
>> pre-trained embeddings or the like)? We then could run indexing on the
>> same hardware with 512, 1024 and 2048 and see what the numbers, limits
>> and behavior actually are.
>>
>> I can help in writing this but not until after Easter.
>>
>>
>> Dawid
>>
>> On Sat, Apr 8, 2023 at 11:29 PM Adrien Grand  wrote:
>> >
>> > As Dawid pointed out earlier on this thread, this is the rule for
>> > Apache projects: a single -1 vote on a code change is a veto and
>> > cannot be overridden. Furthermore, Robert is one of the people on this
>> > project who worked the most on debugging subtle bugs, making Lucene
>> > more robust and improving our test framework, so I'm listening when he
>> > voices quality concerns.
>> >
>> > The argument against removing/raising the limit that resonates with me
>> > the most is that it is a one-way door. As MikeS highlighted earlier on
>> > this thread, implementations may want to take advantage of the fact
>> > that there is a limit at some point too. This is why I don't want to
>> > remove the limit and would prefer a slight increase, such as 2048 as
>> > suggested in the original issue, which would enable most of the things
>> > that users who have been asking about raising the limit would like to
>> > do.
>> >
>> > I agree that the merge-time memory usage and slow indexing rate are
>> > not great. But it's still possible to index multi-million vector
>> > datasets with a 4GB heap without hitting OOMEs regardless of the
>> > number of dimensions, and the feedback I'm seeing is that many users
>> > are still interested in indexing multi-million vector datasets despite
>> > the slow indexing rate. I wish we could do better, and vector indexing
>> > is certainly more expert than text indexing, but it still is usable in
>> > my opinion. I understand how giving Lucene more information about
>> > vectors prior to indexing (e.g. clustering information as Jim pointed
>> > out) could help make merging faster and more memory-efficient, but I
>> > would really like to avoid making it a requirement for indexing
>> > vectors as it also makes this feature much harder to use.
>> >
>> > On Sat, Apr 8, 2023 at 9:28 PM Alessandro Benedetti
>> >  wrote:
>> > >
>> > > I am very attentive to listen opinions but I am un-convinced here and
>> I an not sure that a single person opinion should be allowed to be
>> detrimental for such an important project.
>> > >
>> > > The limit as far as I know is literally just raising an exception.
>> > > Removing it won't alter in any way 

Re: [Proposal] Remove max number of dimensions for KNN vectors

2023-04-05 Thread Gus Heck
10 MB hard drive, wow I'll never need another floppy disk ever...
Neural nets... nice idea, but there will never be enough CPU power to run
them...

etc.

Is it possible to make it a configurable limit?

On Wed, Apr 5, 2023 at 4:51 PM Jack Conradson  wrote:

> I don't want to get too far off topic, but I think one of the problems
> here is that HNSW doesn't really fit well as a Lucene data structure. The
> way it behaves it would be better supported as a live, in-memory data
> structure instead of segmented and written to disk for tiny graphs that
> then need to be merged. I wonder if it may be a better approach to explore
> other possible algorithms that are designed to be on-disk instead of
> in-memory even if they require k-means clustering as a trade-off. Maybe
> with an on-disk algorithm we could have good enough performance for a
> higher-dimensional limit.
>
> On Wed, Apr 5, 2023 at 10:54 AM Robert Muir  wrote:
>
>> I'd ask anyone voting +1 to raise this limit to at least try to index
>> a few million vectors with 756 or 1024, which is allowed today.
>>
>> IMO based on how painful it is, it seems the limit is already too
>> high, I realize that will sound controversial but please at least try
>> it out!
>>
>> voting +1 without at least doing this is really the
>> "weak/unscientifically minded" approach.
>>
>> On Wed, Apr 5, 2023 at 12:52 PM Michael Wechner
>>  wrote:
>> >
>> > Thanks for your feedback!
>> >
>> > I agree, that it should not crash.
>> >
>> > So far we did not experience crashes ourselves, but we did not index
>> > millions of vectors.
>> >
>> > I will try to reproduce the crash, maybe this will help us to move
>> forward.
>> >
>> > Thanks
>> >
>> > Michael
>> >
>> > Am 05.04.23 um 18:30 schrieb Dawid Weiss:
>> > >> Can you describe your crash in more detail?
>> > > I can't. That experiment was a while ago and a quick test to see if I
>> > > could index rather large-ish USPTO (patent office) data as vectors.
>> > > Couldn't do it then.
>> > >
>> > >> How much RAM?
>> > > My indexing jobs run with rather smallish heaps to give space for I/O
>> > > buffers. Think 4-8GB at most. So yes, it could have been the problem.
>> > > I recall segment merging grew slower and slower and then simply
>> > > crashed. Lucene should work with low heap requirements, even if it
>> > > slows down. Throwing ram at the indexing/ segment merging problem
>> > > is... I don't know - not elegant?
>> > >
>> > > Anyway. My main point was to remind folks about how Apache works -
>> > > code is merged in when there are no vetoes. If Rob (or anybody else)
>> > > remains unconvinced, he or she can block the change. (I didn't invent
>> > > those rules).
>> > >
>> > > D.
>> > >
>> > > -
>> > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> > > For additional commands, e-mail: dev-h...@lucene.apache.org
>> > >
>> >
>> >
>> > -
>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> > For additional commands, e-mail: dev-h...@lucene.apache.org
>> >
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Lucene PMC Chair Greg Miller

2023-03-07 Thread Gus Heck
Congratulations Greg and thanks Bruno!

On Tue, Mar 7, 2023 at 3:13 PM Tomás Fernández Löbbe 
wrote:

> Thanks Bruno! and Congratulations Greg!
>
> On Tue, Mar 7, 2023 at 10:49 AM Patrick Zhai  wrote:
>
>> Thank you Bruno and Greg!
>>
>> On Tue, Mar 7, 2023, 10:40 Mikhail Khludnev  wrote:
>>
>>> Thank you, Bruno. Congratulations, Greg.
>>>
>>> On Mon, Mar 6, 2023 at 8:16 PM Bruno Roustant 
>>> wrote:
>>>
 Hello Lucene developers,

 Lucene Program Management Committee has elected a new chair, Greg
 Miller, and the Board has approved.

 Greg, thank you for stepping up, and congratulations!


 - Bruno

>>>
>>>
>>> --
>>> Sincerely yours
>>> Mikhail Khludnev
>>> https://t.me/MUST_SEARCH
>>> A caveat: Cyrillic!
>>>
>>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: [DISCUSS:] Reproducible Builds

2023-01-22 Thread Gus Heck
Maybe there is another place to make that information available? The
release page?

On Sat, Jan 21, 2023 at 6:34 PM David Smiley  wrote:

> The goals / purpose of "Reproducible Builds" makes sense to me.
>
> However I wish the output that is the subject of reproducibility could be
> the JAR *exclusive* of its MANIFEST.MF.  There is some interesting metadata
> in there -- not essential but a shame to throw away in the name of
> reproducibility.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Sat, Jan 21, 2023 at 4:27 PM Gus Heck  wrote:
>
>> Some discussion on https://github.com/apache/lucene/pull/12096 lead to
>> the question of whether or not reproducible builds (
>> https://reproducible-builds.org/) are something we would like to work
>> towards. I'm a fan, though unlikely to have time to work on it soon.
>>
>> What I can do is monitor this thread and if the consensus seems to be
>> there, make a ticket that a volunteer can work on in the future (or maybe
>> me in the far future, likely after I have some more direct experience from
>> implementing it for Uno-Jar and JesterJ).
>>
>> -Gus
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


[DISCUSS:] Reproducible Builds

2023-01-21 Thread Gus Heck
Some discussion on https://github.com/apache/lucene/pull/12096 lead to the
question of whether or not reproducible builds (
https://reproducible-builds.org/) are something we would like to work
towards. I'm a fan, though unlikely to have time to work on it soon.

What I can do is monitor this thread and if the consensus seems to be
there, make a ticket that a volunteer can work on in the future (or maybe
me in the far future, likely after I have some more direct experience from
implementing it for Uno-Jar and JesterJ).

-Gus

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Request for naming help

2022-12-12 Thread Gus Heck
In that case, maybe "Range Logic Faceting" ?

Relation seems too broad and too overloaded elsewhere, makes me think of
RDBMS, related-ness, joins and such via word associations.

On Mon, Dec 12, 2022 at 3:27 PM Greg Miller  wrote:

> Thank for the suggestion! I like the descriptiveness of it. My only
> hesitation is that is supports more than range intersection based on the
> provided QueryType instance (e.g., within, contains). I _imagine_ that
> intersection will be most common, but I don’t really know of course. I
> thought about generalizing your suggestion to something like “Range
> Relation Faceting,” but fear that would be confusing.
>
> Thanks again!
>
> Cheers,
> -Greg
>
> On Mon, Dec 12, 2022 at 10:19 Gus Heck  wrote:
>
>> Maybe "Range Intersect Faceting"?
>>
>> On Mon, Dec 12, 2022 at 1:11 PM Greg Miller  wrote:
>>
>>> Folks-
>>>
>>> Naming is hard! (But you all know that already).
>>>
>>> Marc D'Mello and I have been working on a new faceting implementation
>>> that's meant to complement Lucene's existing range-relation queries (e.g.,
>>> LongRange#newIntersectsQuery, DoubleRange#newContainsQuery,
>>> LongRangeDocValuesField#newSlowIntersectsQuery, etc.). Well, I should say
>>> Marc is working on the change and I'm just providing nit-picky feedback on
>>> his PR, which is here: https://github.com/apache/lucene/pull/11901. The
>>> general idea of this feature is to allow users to get facet counts for
>>> these sorts of range-relation filters before they're applied. For example,
>>> if a user is indexing ranges with their documents, they may have a set of
>>> query-ranges they want to facet on, based on some range relationship (e.g.,
>>> intersection, contains, etc.).
>>>
>>> As a concrete example, imagine that documents contain a price range
>>> (maybe a document represents some e-commerce product but the price varies
>>> based on some configuration options), and a user wants to build a price
>>> range filter that applies filtering based on whether-or-not the two ranges
>>> intersect (i.e., DoubleRange#newIntersectsQuery to apply a price range
>>> filter). This user wants faceting capabilities over the different
>>> price ranges they want to make available, so they need a way to facet over
>>> a list of provided query-ranges, based on the "intersect" relationship with
>>> the doc-encoded ranges. That's what Marc's "RangeOnRange" faceting is
>>> trying to accomplish.
>>>
>>> In my opinion, the PR is really close to being ready (thanks again
>>> Marc!), but I'm wondering if we can come up with a more descriptive name.
>>> As it currently stands, the feature is termed "RangeOnRange Faceting,"
>>> which feels just a bit wonky to me. That said, I can't really come up with
>>> anything better.
>>>
>>> ** Does anyone have suggestions on a better name? **
>>>
>>> Any / all suggestions appreciated! (And of course, any other input on
>>> the PR is welcome if anyone is interested).
>>>
>>> Cheers,
>>> -Greg
>>>
>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Request for naming help

2022-12-12 Thread Gus Heck
Maybe "Range Intersect Faceting"?

On Mon, Dec 12, 2022 at 1:11 PM Greg Miller  wrote:

> Folks-
>
> Naming is hard! (But you all know that already).
>
> Marc D'Mello and I have been working on a new faceting implementation
> that's meant to complement Lucene's existing range-relation queries (e.g.,
> LongRange#newIntersectsQuery, DoubleRange#newContainsQuery,
> LongRangeDocValuesField#newSlowIntersectsQuery, etc.). Well, I should say
> Marc is working on the change and I'm just providing nit-picky feedback on
> his PR, which is here: https://github.com/apache/lucene/pull/11901. The
> general idea of this feature is to allow users to get facet counts for
> these sorts of range-relation filters before they're applied. For example,
> if a user is indexing ranges with their documents, they may have a set of
> query-ranges they want to facet on, based on some range relationship (e.g.,
> intersection, contains, etc.).
>
> As a concrete example, imagine that documents contain a price range (maybe
> a document represents some e-commerce product but the price varies based on
> some configuration options), and a user wants to build a price range filter
> that applies filtering based on whether-or-not the two ranges intersect
> (i.e., DoubleRange#newIntersectsQuery to apply a price range filter). This
> user wants faceting capabilities over the different price ranges they want
> to make available, so they need a way to facet over a list of provided
> query-ranges, based on the "intersect" relationship with the doc-encoded
> ranges. That's what Marc's "RangeOnRange" faceting is trying to accomplish.
>
> In my opinion, the PR is really close to being ready (thanks again Marc!),
> but I'm wondering if we can come up with a more descriptive name. As it
> currently stands, the feature is termed "RangeOnRange Faceting," which
> feels just a bit wonky to me. That said, I can't really come up with
> anything better.
>
> ** Does anyone have suggestions on a better name? **
>
> Any / all suggestions appreciated! (And of course, any other input on the
> PR is welcome if anyone is interested).
>
> Cheers,
> -Greg
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Welcome Luca Cavanna as Lucene committer

2022-10-05 Thread Gus Heck
Welcome :)

On Wed, Oct 5, 2022 at 5:38 PM Michael McCandless 
wrote:

> Welcome Luca!
>
> Mike
>
> On Wed, Oct 5, 2022 at 4:37 PM Tomás Fernández Löbbe <
> tomasflo...@gmail.com> wrote:
>
>> Congratulations Luca!!
>>
>> On Wed, Oct 5, 2022 at 2:19 PM Vigya Sharma  wrote:
>>
>>> Congratulations Luca! And welcome...
>>>
>>> Vigya
>>>
>>> On Wed, Oct 5, 2022 at 3:36 PM Uwe Schindler  wrote:
>>>
 Welcome Luca. This was long overdue. 

 Am 5. Oktober 2022 19:03:43 MESZ schrieb Adrien Grand <
 jpou...@gmail.com>:
>
> I'm pleased to announce that Luca Cavanna has accepted the PMC's
> invitation to become a committer.
>
> Luca, the tradition is that new committers introduce themselves with a
> brief bio.
>
> Congratulations and welcome!
>
> --
> Adrien
>
 --
 Uwe Schindler
 Achterdiek 19, 28357 Bremen
 
 https://www.thetaphi.de

>>>
>>>
>>> --
>>> - Vigya
>>>
>> --
> Mike McCandless
>
> http://blog.mikemccandless.com
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Code coverage check for PRs

2022-10-05 Thread Gus Heck
One thing codecov gives is a sense of what the coverage was previously
without having to go hunt down past builds in jenkins. It is a coverage
focused view essentially. It just uses the coverage data the build already
calculates. It used to have some nifty graphical visualizations but those
seem to have disappeared or become hard to find.

On Wed, Oct 5, 2022 at 9:24 AM Robert Muir  wrote:

> I would recommend a search for "github actions jacoco" to review
> what's common out there.
>
> If we change 'gradle test' to 'gradle coverage' in our existing
> PR-test action, the next step is to just not throw away the reports,
> but make them available. See
>
> https://docs.github.com/en/actions/using-workflows/storing-workflow-data-as-artifacts
> for some documentation on this.
>
> Seems common for PR workflows to have the action "comment on the PR"
> with coverage information. Not sure if we want that as it could result
> in a ton of comments.
>
> Finally, the current "gradle coverage" builds a separate coverage
> report for each lucene module, I think. So we may want to think about
> adding support to "merge" the jacoco data across all the modules and
> build one monster report for all of lucene, too. This would just be
> some work with the gradle build: but I think it would make the
> information a lot easier to digest. This is already happening with the
> "jenkins coverage build" which presents one monster report, but I
> think it may be something on the jenkins side doing it?
>
> https://ci-builds.apache.org/job/Lucene/job/Lucene-Coverage-main/lastBuild/jacoco/
>
>
> On Wed, Oct 5, 2022 at 8:58 AM Patrick Zhai  wrote:
> >
> > Make sense to me, I'll try to look into it!
> >
> > On Tue, Oct 4, 2022, 16:50 Robert Muir  wrote:
> >>
> >> We already have code coverage integrated into the build. See the
> >> documentation on how to generate the reports:
> >> https://github.com/apache/lucene/blob/main/help/tests.txt
> >>
> >> I think we should stick with jacoco and not some commercial stuff for
> >> measuring coverage. Jacoco works great. We just have to put the
> >> reports or stats somewhere useful.
> >>
> >> On Tue, Oct 4, 2022 at 5:45 PM Patrick Zhai  wrote:
> >> >
> >> > Hi Robert, thank you for commenting, yeah the functionality I want to
> add is actually the line by line code coverage stats for the new/changed
> line that are in the patch so that we don't need to wonder about "whether
> that line is covered by the test?". But I'm against using the code coverage
> as any kind of hard criteria, like coverage must be kept at a certain % or
> all the new lines must be covered, that will drive people crazy. I think
> that should be just treated as a helpful thing to check when
> reviewing/creating the PR.
> >> >
> >> > I searched a little on google and found this:
> https://about.codecov.io/, it's free for open source and seems to have
> the functionality we need. Let me know if anyone has ideas about this, or
> otherwise I can try it a little bit with my own repo first and then try to
> add it to lucene.
> >> >
> >> > Best
> >> > Patrick
> >> >
> >> >
> >> >
> >> > On Tue, Oct 4, 2022, 06:36 Robert Muir  wrote:
> >> >>
> >> >> btw, you can look at the current reports created by jenkins here:
> >> >>
> https://ci-builds.apache.org/job/Lucene/job/Lucene-Coverage-main/lastBuild/jacoco/
> >> >>
> >> >> On Tue, Oct 4, 2022 at 6:51 AM Robert Muir  wrote:
> >> >> >
> >> >> > we can run the tests with coverage option and produce coverage
> graph
> >> >> > from the github actions, but need to look at the docs to see where
> to
> >> >> > put it so it will be available.
> >> >> >
> >> >> > I want us to be careful about the word "check" as I'm adamantly
> >> >> > against any such automated check (e.g. coverage > N%) in the logic.
> >> >> > Coverage report is just a tool to help us and the moment we do
> stupid
> >> >> > shit like that, is the moment people start gaming it just to make
> the
> >> >> > build pass.
> >> >> >
> >> >> > On Mon, Oct 3, 2022 at 10:57 PM Patrick Zhai 
> wrote:
> >> >> > >
> >> >> > > Hi folks,
> >> >> > > I'm not sure whether people have already discussed this but I'm
> wondering whether we want to add a workflow that pulls out the code
> coverage whenever a PR was created? It should be easier for both the
> reviewers and the contributors to figure out what can be improved, or at
> least figure out a part that is probably not covered by the tests?
> >> >> > >
> >> >> > > Best
> >> >> > > Patrick
> >> >>
> >> >> -
> >> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >> >>
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>
>
> 

Re: Welcome Vigya Sharma as Lucene committer

2022-07-28 Thread Gus Heck
Welcome!

On Thu, Jul 28, 2022 at 11:24 AM Julie Tibshirani 
wrote:

> Congratulations Vigya!
>
> On Thu, Jul 28, 2022 at 6:34 AM Mayya Sharipova
>  wrote:
>
>> Congratulations and welcome Vigya!
>>
>>
>> On Thu, Jul 28, 2022 at 9:31 AM Nhat Nguyen
>>  wrote:
>>
>>> Welcome, Vigya!
>>>
>>> On Thu, Jul 28, 2022 at 9:09 AM Dawid Weiss 
>>> wrote:
>>>

 Congratulations and welcome, Vigya!
 Dawid

 On Thu, Jul 28, 2022 at 9:34 AM Adrien Grand  wrote:

> I'm pleased to announce that Vigya Sharma has accepted the PMC's
> invitation to become a committer.
>
> Vigya, the tradition is that new committers introduce themselves with a
> brief bio.
>
> Congratulations and welcome!
>
> --
> Adrien
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: [DISCUSS] Read-only Jira after the GitHub issues migration?

2022-07-18 Thread Gus Heck
I am 100% for preventing creation of new issues in Jira, new issues should
only be created in one system at any one time. I feel that existing issues
should be completed in their original system for continuity, and
anticipate that in any case Jira will mean readable in perpetuity. The
copying of old issues to github as a convenience for users so they aren't
forced to look at 2 places also sounds good. Raising the standard for what
we consider a stale issue and closing out things in Jira faster to get to a
one system situation sooner also seems good.

Things I think we should strive to avoid:
1) An issue in Jira that is unresolved and duplicated (possibly resolved)
in github... possibly leading to someone wasting time repeating a solution
or giving up thinking there isn't a solution etc.
2) Any issues for which the discussion is split across systems and thus it
would be easy to miss part of the discussion and/or not have the issue come
up in searches that are relevant to that issue.

Also, a common pattern for me is to throw an issue ticket number that I
have noted somewhere (i.e LUCENE-12345) into google and browse to the
ticket if it comes up directly or to a mail archive result which has a link
to the Jira. This is faster than searching in jira itself because I can
always get to google in a single keystroke (new tab).  Sadly this is
unlikely to work with github which does not put a project moniker on the
issue id. Not sure how many others do this but if it's common I wonder if
we can auto-insert something of the sort into github tickets so that mail
archives from the tickets are similarly searchable? Like LUCENE-G12345 for
github ticket #12345? The two key things that make this useful are the
searchability of the ID in google and the fact that ticket mails often have
a link to the ticket which the archive sites will render as a hyperlink.

-Gus

On Mon, Jul 18, 2022 at 11:12 AM David Smiley  wrote:

> I suppose someone bent on not using GitHub could also email the patch to
> the dev list, starting a thread around it.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Sun, Jul 17, 2022 at 9:14 AM Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> Hi Team,
>>
>> Thanks to Tomoko's amazing hard work (
>> https://github.com/apache/lucene-jira-archive), we are getting close to
>> having strong tooling and a solid plan to migrate all past Jira issues to
>> GItHub issues!
>>
>> But one contentious point is whether to leave Jira read-only or
>> read-write after the migration.  So let's DISCUSS and maybe VOTE to reach
>> concensus?
>>
>> My opinion: I think it'd be crazy to leave Jira read/write.  We would
>> effectively have two issue trackers.  New users who find Jira through
>> Google, or through links we have in old blog posts, etc., might
>> accidentally open new Jira issues or comment on old ones and we may not
>> even notice.  I think that would harm our community.
>>
>> I would prefer that we make a nearly atomic switch -- up until time X we
>> use Jira, then it goes read-only and at time X + t (t being how long the
>> migration takes, likely a day or two?), GitHub issues opens for business.
>> This way we clarly have only one issue tracker at (nearly) all times.  This
>> would make a clean migration, and reduce risk of trapping users.
>>
>> Other opinions?
>>
>> Thanks,
>>
>> Mike
>> --
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: [RESULT] [VOTE] Migration to GitHub issue from Jira

2022-06-17 Thread Gus Heck
I hope you count me as someone who sees history as important. It's
important in more ways than one however. You gave the example of trying to
understand something, and looking at the issue history directly. I also
give weight to the scenario where someone has written a blog post about the
topic and linked the issue "For the latest see LUCENE-" for example...
Or someone planning upgrades has a spreadsheet of things to track down...
The existing links should point to a *complete* history of the issue.

I don't see the migration of everything to github as being as critical as
you do but I'm not at all against migrating things that are closed if
someone wants to do that work, and perhaps even copying over existing open
issues periodically as they become closed (and accelerating the close rate
by aggressive closing of silent issues). No new issues in Jira sounds
fine, even better if enforced by Jira. Proceed from here in Github since
that's where the community wants to go. Links to the migrated version
automatically added to Jira and/or backlinks to Jira would be just fine too
since readers might (hopefully needlessly) worry that something didn't get
migrated, we should make it easy to check.

What I don't want is for someone to land on an issue via link or via google
search (or via search in jira because they are using Jira already for some
other apache project), read through it and think A) it never got resolved
when it did or B) miss the fact that it got reopened and further changes
were made and only have half the story... or any other scenario where they
are looking at an incomplete record of the issue. (thus
obfuscating/splitting the very important rich history across systems).

So that's why I feel issues should be completely tracked in the system
where they were created. Syncing old closed stuff into a new system
probably is fine so long as there are periodic sweeps to pull in reopens or
newly completed issues. We could even sync open things so long as they are
clearly marked in the title as having their primary record in Jira and
"last synced from JIRA on -MM-DD" or something in a final comment each
time new content is brought over.

For simplicity and workload however maybe just sync things when they close.
Depends on how much effort the person writing code for syncing things wants
to put into it I guess.

Although I agree with Dawid on the "What if Elon buys it?" issue, that ship
has sailed, the community accepts that risk and we probably should not
rehash it.

WRT Robert's comments on PRs being issues... this has already worried me
because I've already seen a lot of discussion on PR's and I've worried that
this stuff has the potential to get lost or be hard to find. If there is
one key positive of this move is that they will become easier to find since
the search in github can find it. I would say that a PR is not a substitute
for a well described issue report but that's probably a separate discussion
(which I would hope mirrors the policy on small edits like typos or adding
comments/javadoc not needing an issue). I've also seen folks who like to
clean up and remove old branches and PR's, which is problematic if that's
where the important discussion is (possibly a 3rd can of worms there).

-Gus

On Fri, Jun 17, 2022 at 4:34 PM Robert Muir  wrote:

> On Fri, Jun 17, 2022 at 3:27 PM Dawid Weiss  wrote:
> >
> > I'd be more afraid of what happens to github issues in two years (or
> longer). Will it look the same? Will it be different? Will it be gone (and
> how do we get a backup of the isse history then)? Contrary to the
> apache-hosted Jira, github is very much an independent entity. If Elon Musk
> decides to buy and close it tomorrow... then what? :)
> >
>
> We already have a ton of github "issues" (pull requests, since PRs are
> issues).
> If you want to "back them up", its easy, you can paginate thru them
> 100 at a time, e.g. run this command, incrementing 'page' until it
> returns empty list:
>
>   curl -H "Accept: application/vnd.github.v3+json"
> "
> https://api.github.com/repos/apache/lucene/issues?per_page=100=1=asc=all
> "
> > file1.json
>
> Yeah of course if you want to backup the comments and stuff, you'll
> need to do more.
> But it is already the case today, that a ton of this "history" is
> already in github issues, as PRs. Most recent JIRAs are just useless
> placeholders.
> Also the same risks apply to JIRA, except are not theoretical and real
> concerns, no? I thought Atlassian had deprecated "onsite" JIRA to try
> to sucker you into their "Atlassian Cloud":
> https://www.theregister.com/2020/10/19/atlassian_server_licenses/
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: [RESULT] [VOTE] Migration to GitHub issue from Jira

2022-06-15 Thread Gus Heck
+1 to your suggestion.

On Thu, Jun 16, 2022 at 12:34 AM Tomoko Uchida 
wrote:

> We have two conflicting requests:
> 1. We don't want to duplicate/diverge issues; an issue's identity is
> what matters the most.
> 2. We don't want to keep holding multiple issue systems; having only
> one system is what matters the most.
>
> They are inevitably in conflict with each other - it looks like many
> folks put more weight on 1 than 2, then I would go with it.
> I'd like to set a principle (not a very strict rule) to avoid
> unnecessary confusion during the migration period.
>
> * All new issues should be opened on GitHub. Opening new Jira issues
> is discouraged unless there is a good reason.
> * All existing issues should be resolved in Jira. Copying or moving
> Jira issues to GitHub is discouraged unless there is a good reason.
>
> Is there anyone who strongly opposes this?
>
> Tomoko
>
> 2022年6月16日(木) 5:44 David Smiley :
> >
> > I'm not a fan of the automated copying of any issues into GitHub, which
> will create a divergence / duplicity of an issue's identity.  It will only
> be a relatively temporary annoyance to have two systems to "work" on an
> issue.  Eventually, JIRA will only be historical; let's say Lucene 11.  At
> that point if there's an older issue of resumed interest, which would be
> getting increasingly rare, someone could manually copy the original
> description and title into GitHub plus a historical reference back.  Again
> -- rare by then.
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
> >
> > On Wed, Jun 15, 2022 at 4:18 PM Tomoko Uchida <
> tomoko.uchida.1...@gmail.com> wrote:
> >>
> >> It looks like we talked about two or three things at the same time -
> >> and I'm afraid the discussion will quickly turn into a disordered
> >> state and I won't be able to track it.
> >>
> >> Let me decide one thing: Let's NOT try to move histories to GitHub.
> >> Closed issues will remain in Jira forever and we can refer to them
> >> anytime from anywhere. I think I said that before several times.
> >>
> >> I would like to focus on the future here - can we make a decision on
> >> how to handle active (unresolved) issues and issues that will be
> >> opened in the future.
> >>
> >> Thank you,
> >> Tomoko
> >>
> >> 2022年6月16日(木) 4:18 Dawid Weiss :
> >>
> >> >
> >> >
> >> >> Totally agree. The history of closed issues answer “when did this
> change and why?”. Migrate them all. Computers can do that. It avoids asking
> humans to think about where stuff is.
> >> >
> >> >
> >> > We do have different views of that. To me, the history is preserved
> perfectly well in Jira, it's not being phased out. Moving to github as the
> issue tracking system is fine but different to me than code transitions
> (cvs->svn->git). With code, you do have an existing state and history you
> build from. With issue tickets - not so much. And even if you want to
> create a ticket in the new system, you can easily link to the previous one.
> It's the "web" of hyperlinks, right?
> >> >
> >> > I'm a bit afraid that moving hundreds of jira issues to github will
> have the reverse effect - duplicate the same information but with quality
> degraded, for example automatic links that work in Jira will no longer work
> or point at the ported github issues ("this is related to LUCENE-xyz or
> SOLR-abc, blah, blah blah.")?
> >> >
> >> > I don't want to stand in the way of progress but we've gone through a
> similar transition at our company and I never had a problem using both
> systems at the same time; jira just gradually atrophied into a read-only
> state once issues in there got stale or resolved.
> >> >
> >> > Dawid
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: [RESULT] [VOTE] Migration to GitHub issue from Jira

2022-06-15 Thread Gus Heck
I agree with the idea that we shouldn't have 2 active trackers, but I think
that the apache jira must remain readable forever. People will have
bookmarked or linked issues in documents, blog posts or web pages and
breaking those links would be a HUGE disservice to the community. Ideally
we would set Jira to not accept new issues and lock out comments on closed
issues. New issues would then appear in github. If we were able to migrate
a locked, read only copy of closed jira's into github (and include a link
back to the original) that might be of some help to users so they can work
in github and ignore Jira, but we should not allow further discussion in
github of something discussed in Jira. Really bad to have someone look at
an issue, think they have the full picture and be missing 1/3 of the
discussion.

So principles I'd like to advocate are:
1) Don't break links to Jiras
2) Single source of truth for any individual issue.
3) Optionally for user convenience reflect the source of truth for old
issues in github as read only, with a back reference.

On Wed, Jun 15, 2022 at 11:56 AM Michael McCandless <
luc...@mikemccandless.com> wrote:

> On Wed, Jun 15, 2022 at 10:46 AM Tomoko Uchida <
> tomoko.uchida.1...@gmail.com> wrote:
>
>> Thank you everyone for your suggestions.
>> I don't have a strong opinion on how to handle existing issues, I just
>> want to proceed with the migration smoothly. I'd open this discussion
>> until we find a better (not perfect) option or reach some level of
>> agreement.
>>
>
> I see you already have a start at the migration plan, yay!  (The comment
> on LUCENE-10557)
>
> Could we maybe pull that out into a wiki page so we can more easily
> collaborate on the steps?
>
>
>> > make the Jira project read only.
>>
>> I'm sorry but I don't think we can make Jira read only... I think we
>> should support the backup contribution paths outside GitHub, and
>> personally, I don't want to back to a mail-based way.
>> We've seen there are people who don't use GitHub for whatever reason
>> and I think we can't ignore the risk of GitHub account banning - it
>> can happen accidentally to anyone (I don't know the surveillance
>> system in GitHub at all but it might be automated? Systems can make
>> mistakes and recovering an account may take some time).
>>
>
> Hmm, I think it's quite risky/dangerous to leave both writable?  It'd be
> forking our issue tracker.  We'll have situations where some of us update
> the Jira issue, others update the GitHub issue, we lose context/comments,
> we duplicate work (thinking nobody is working on the GitHub issue yet
> someone was actually working on the Jira one).  It would add
> risk/friction/taxation to development going forward ... people would need
> to know to check two places (GitHub and Jira) for updates, new issues,
> patches, linked PRs, etc.
>
> To me the migration would ideally be an atomic switch -- only Jira is
> writeable up until some point, then it goes read only, we kick off the
> (hopefully already well tested/debugged migration tool, probably just
> forking this nice tool that the Lucene.net devs created
> ),
> then GitHub issues is writable.
>
> This nicely matches how SVN -> Git migration went.
>
> Yes, some people are not fully comfortable with GitHub, yet, but we expect
> that to be the minority, we expect account blocking to be rare and easy to
> resolve, etc. (since our VOTE to migrate has passed).
>
> I really feel we should make a hard switch for the best long-term health
> of the dev community.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Welcome Greg Miller to the Lucene PMC

2022-06-07 Thread Gus Heck
Welcome Greg :)


On Tue, Jun 7, 2022 at 5:43 PM David Smiley  wrote:

> Welcome Greg!
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Tue, Jun 7, 2022 at 2:44 AM Adrien Grand  wrote:
>
>> I'm pleased to announce that Greg Miller has accepted an invitation to
>> join the Lucene PMC!
>>
>> Congratulations Greg, and welcome aboard!
>>
>> --
>> Adrien
>>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Welcome Chris Hegarty as Lucene committer

2022-06-01 Thread Gus Heck
Welcome and congratulations :)

On Wed, Jun 1, 2022 at 4:50 PM Martin Gainty  wrote:

> Welcome Chris!
> martin
> --
> *From:* Tomoko Uchida 
> *Sent:* Wednesday, June 1, 2022 11:05 AM
> *To:* Lucene Dev 
> *Subject:* Re: Welcome Chris Hegarty as Lucene committer
>
> Congratulations and welcome, Chris!
>
> Tomoko
>
>
> 2022年6月1日(水) 23:17 Nhat Nguyen :
>
> Welcome, Chris!
>
> On Wed, Jun 1, 2022 at 8:49 AM Greg Miller  wrote:
>
> Welcome Chris!
>
> On Wed, Jun 1, 2022 at 2:04 PM Mayya Sharipova
>  wrote:
> >
> > Welcome and congratulations, Chris!
> >
> > On Wed, Jun 1, 2022 at 7:53 AM Jan Høydahl 
> wrote:
> >>
> >> Welcome Chris!
> >>
> >> Jan
> >>
> >> > 1. jun. 2022 kl. 09:04 skrev Adrien Grand :
> >> >
> >> > I'm pleased to announce that Chris Hegarty has accepted the PMC's
> >> > invitation to become a committer.
> >> >
> >> > Chris, the tradition is that new committers introduce themselves with
> a
> >> > brief bio.
> >> >
> >> > Congratulations and welcome!
> >> >
> >> > --
> >> > Adrien
> >>
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Welcome Lu Xugang as Lucene committer

2022-06-01 Thread Gus Heck
Welcome and congratulations :)

On Wed, Jun 1, 2022 at 3:32 PM Alessandro Benedetti 
wrote:

> Welcome on board Xugang!
> --
> *Alessandro Benedetti*
> CEO @ Sease Ltd.
> *Apache Lucene/Solr Committer*
> *Apache Solr PMC Member*
>
> e-mail: a.benede...@sease.io
>
>
> *Sease* - Information Retrieval Applied
> Consulting | Training | Open Source
>
> Website: Sease.io 
> LinkedIn  | Twitter
>  | Youtube
>  | Github
> 
>
>
> On Wed, 1 Jun 2022 at 19:10, Julie Tibshirani  wrote:
>
>> Welcome Xugang!!
>>
>> On Wed, Jun 1, 2022 at 10:04 AM David Smiley  wrote:
>>
>>> Welcome Lu!
>>>
>>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: [VOTE] Migration to GitHub issue from Jira (LUCENE-10557)

2022-05-31 Thread Gus Heck
-1 I think the disruption and bifurcation of where to find history is not
worth it. I also noticed a comment in the lucene issue for migration with
summaries by date range, status, affects version,  etc. sub-area, exactly
the sort of thing I expect to be much more difficult to obtain from github.
What I would find interesting is a deep integration of the two systems so
that initiation and basic commenting could be handled on github, but
transmitted to Jira where full metadata and reporting/tracking could be
maintained.

On Tue, May 31, 2022 at 12:17 AM Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:

> -1
>
> On Tue, 31 May, 2022, 4:06 am Xi Chen, 
> wrote:
>
>> +1 from me (committer, non-PMC)
>>
>> Thanks Tomoko for starting the discussion and organizing / leading this
>> effort!
>>
>> Best,
>> Zach
>>
>> On May 30, 2022, at 2:56 PM, Houston Putman  wrote:
>>
>> 
>> +1 Approve (PMC)
>>
>> Thanks so much for doing all of the work for this Tomoko!
>>
>> - Houston
>>
>> On Mon, May 30, 2022 at 5:38 PM David Smiley  wrote:
>>
>>> +1 Approve (PMC)
>>>
>>> ~ David Smiley
>>> Apache Lucene/Solr Search Developer
>>> http://www.linkedin.com/in/davidwsmiley
>>>
>>>
>>> On Mon, May 30, 2022 at 11:40 AM Tomoko Uchida <
>>> tomoko.uchida.1...@gmail.com> wrote:
>>>
 Hi everyone!

 As we had previous discussion thread [1], I propose migration to GitHub
 issue from Jira.
 It'd be technically possible (see [2] for details) and I think it'd be
 good for the project - not only for welcoming new developers who are not
 familiar with Jira, but also for improving the experiences of long-term
 committers/contributors by consolidating the conversation platform.

 You can see a short summary of the discussion, some stats on current
 Jira issues, and a draft migration plan in [2].
 Please review [2] if you haven't seen it and vote for this proposal.

 The vote will be open until 2022-06-06 16:00 UTC.

 [ ] +1  approve
 [ ] +0  no opinion
 [ ] -1  disapprove (and reason why)

 Here is my +1

 *IMPORTANT NOTE*
 I set a local protocol for this vote.
 There are 95 committers on this project [3] - the vote will be
 effective if it successfully gains more than 15% of voters (>= 15) from
 committers (including PMC members). This means, that although only PMC
 member votes are counted for the final result, the votes from all
 committers are important to make the vote result effective.

 If there are less than 15 votes at 2022-06-06 16:00 UTC, I will expand
 the term to 2022-06-13 16:00 UTC. If this fails to get sufficient voters
 after the expanded time limit, I'll cancel this vote regardless of the
 result.
 But why do I set such an extra bar? My fear is that if such things are
 decided by the opinions of a few members, the result shouldn't yield a good
 outcome for the future. It isn't my goal to just pass the vote [4].

 [1] https://lists.apache.org/thread/78wj0vll73sct065m5jjm4z8gqb5yffk
 [2] https://issues.apache.org/jira/browse/LUCENE-10557
 [3] https://projects.apache.org/committee.html?lucene
 [4] I'm sorry for being overly cautious, but I have never met in person
 or virtually any of the committers (with a very few exceptions), therefore
 cannot assess if the vote result is reliable or not unless there is certain
 explicit feedback.

 Tomoko

>>>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Bugfix release Lucene/Solr 8.11.2

2022-05-18 Thread Gus Heck
SOLR-16194 is in and ported to 8.11,.2

On Wed, May 18, 2022 at 7:12 AM Jan Høydahl  wrote:

> I was pinged on https://issues.apache.org/jira/browse/SOLR-16019 because
> I have an in-flight PR with a backport. I'll complete and merge that PR.
>
> Jan
>
>
> 13. mai 2022 kl. 01:03 skrev Mike Drob :
>
> To: dev@lucene, dev@solr
>
> NOTICE:
>
> I am planning on preparing a bugfix release from branch branch_8_11
> (likely mid next week)
>
> Please observe the normal rules for committing to this branch:
>
> * Before committing to the branch, reply to this thread and argue
>   why the fix needs backporting and how long it will take.
> ** If you're backporting stuff this week still or over the weekend, then
> skip
> the bit about how long it will take.
> * All issues accepted for backporting should be marked with 8.11.2
>   in JIRA, and issues that should delay the release must be marked as
> Blocker
> * All patches that are intended for the branch should first be committed
>   to the unstable branch, merged into the stable branch, and then into
>   the current release branch.
> * Only Jira issues with Fix version 8.11.2 and priority "Blocker" will
> delay
>   a release candidate build.
>
> Also, please observe that since 9.0 already exists, there cannot be any
> index format breaking changes. It really should only be bug fixes that have
> already been verified on the 9x branch.
>
> Thanks,
> Mike
>
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Bugfix release Lucene/Solr 8.11.2

2022-05-15 Thread Gus Heck
commit and backport of course :)

On Sun, May 15, 2022 at 7:47 PM Gus Heck  wrote:

>  https://issues.apache.org/jira/browse/SOLR-16194 now has a PR (
> https://github.com/apache/solr/pull/864) will commit after review or 3
> days without objections
>
> On Fri, May 13, 2022 at 12:19 PM Gus Heck  wrote:
>
>> I think it would be good if we can get
>> https://issues.apache.org/jira/browse/SOLR-16194 into 8.11.2 I plan to
>> work on it this weekend. I'm hoping it will be a straightforward matter of
>> adding a check for existing collections.
>>
>> On Fri, May 13, 2022 at 4:21 AM Anshum Gupta 
>> wrote:
>>
>>> Yes please! I assumed that was already the case as both lists are copied
>>> :)
>>>
>>> On Fri, May 13, 2022 at 12:47 AM Uwe Schindler  wrote:
>>>
>>>> Should we maybe also ask on the Lucene side if any backports to 8.11
>>>> would be good?
>>>>
>>>>
>>>>
>>>> Uwe
>>>>
>>>>
>>>>
>>>> -
>>>>
>>>> Uwe Schindler
>>>>
>>>> Achterdiek 19, D-28357 Bremen
>>>>
>>>> https://www.thetaphi.de
>>>>
>>>> eMail: u...@thetaphi.de
>>>>
>>>>
>>>>
>>>> *From:* Anshum Gupta 
>>>> *Sent:* Friday, May 13, 2022 1:23 AM
>>>> *To:* d...@solr.apache.org
>>>> *Cc:* Solr/Lucene Dev 
>>>> *Subject:* Re: Bugfix release Lucene/Solr 8.11.2
>>>>
>>>>
>>>>
>>>> Thanks for volunteering, Mike!
>>>>
>>>>
>>>>
>>>> I think the commits I was tracking to be in 8x are already there, but
>>>> I'll confirm this over the weekend and let you know in case I intend to
>>>> backport anything more.
>>>>
>>>>
>>>>
>>>> On Thu, May 12, 2022 at 4:03 PM Mike Drob  wrote:
>>>>
>>>> To: dev@lucene, dev@solr
>>>>
>>>>
>>>>
>>>> NOTICE:
>>>>
>>>>
>>>> I am planning on preparing a bugfix release from branch branch_8_11
>>>> (likely mid next week)
>>>>
>>>> Please observe the normal rules for committing to this branch:
>>>>
>>>> * Before committing to the branch, reply to this thread and argue
>>>>   why the fix needs backporting and how long it will take.
>>>>
>>>> ** If you're backporting stuff this week still or over the weekend,
>>>> then skip
>>>>
>>>> the bit about how long it will take.
>>>> * All issues accepted for backporting should be marked with 8.11.2
>>>>   in JIRA, and issues that should delay the release must be marked as
>>>> Blocker
>>>> * All patches that are intended for the branch should first be committed
>>>>   to the unstable branch, merged into the stable branch, and then into
>>>>   the current release branch.
>>>> * Only Jira issues with Fix version 8.11.2 and priority "Blocker" will
>>>> delay
>>>>   a release candidate build.
>>>>
>>>>
>>>>
>>>> Also, please observe that since 9.0 already exists, there cannot be any
>>>> index format breaking changes. It really should only be bug fixes that have
>>>> already been verified on the 9x branch.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Mike
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Anshum Gupta
>>>>
>>>
>>>
>>> --
>>> Anshum Gupta
>>>
>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Bugfix release Lucene/Solr 8.11.2

2022-05-15 Thread Gus Heck
 https://issues.apache.org/jira/browse/SOLR-16194 now has a PR (
https://github.com/apache/solr/pull/864) will commit after review or 3 days
without objections

On Fri, May 13, 2022 at 12:19 PM Gus Heck  wrote:

> I think it would be good if we can get
> https://issues.apache.org/jira/browse/SOLR-16194 into 8.11.2 I plan to
> work on it this weekend. I'm hoping it will be a straightforward matter of
> adding a check for existing collections.
>
> On Fri, May 13, 2022 at 4:21 AM Anshum Gupta 
> wrote:
>
>> Yes please! I assumed that was already the case as both lists are copied
>> :)
>>
>> On Fri, May 13, 2022 at 12:47 AM Uwe Schindler  wrote:
>>
>>> Should we maybe also ask on the Lucene side if any backports to 8.11
>>> would be good?
>>>
>>>
>>>
>>> Uwe
>>>
>>>
>>>
>>> -
>>>
>>> Uwe Schindler
>>>
>>> Achterdiek 19, D-28357 Bremen
>>>
>>> https://www.thetaphi.de
>>>
>>> eMail: u...@thetaphi.de
>>>
>>>
>>>
>>> *From:* Anshum Gupta 
>>> *Sent:* Friday, May 13, 2022 1:23 AM
>>> *To:* d...@solr.apache.org
>>> *Cc:* Solr/Lucene Dev 
>>> *Subject:* Re: Bugfix release Lucene/Solr 8.11.2
>>>
>>>
>>>
>>> Thanks for volunteering, Mike!
>>>
>>>
>>>
>>> I think the commits I was tracking to be in 8x are already there, but
>>> I'll confirm this over the weekend and let you know in case I intend to
>>> backport anything more.
>>>
>>>
>>>
>>> On Thu, May 12, 2022 at 4:03 PM Mike Drob  wrote:
>>>
>>> To: dev@lucene, dev@solr
>>>
>>>
>>>
>>> NOTICE:
>>>
>>>
>>> I am planning on preparing a bugfix release from branch branch_8_11
>>> (likely mid next week)
>>>
>>> Please observe the normal rules for committing to this branch:
>>>
>>> * Before committing to the branch, reply to this thread and argue
>>>   why the fix needs backporting and how long it will take.
>>>
>>> ** If you're backporting stuff this week still or over the weekend, then
>>> skip
>>>
>>> the bit about how long it will take.
>>> * All issues accepted for backporting should be marked with 8.11.2
>>>   in JIRA, and issues that should delay the release must be marked as
>>> Blocker
>>> * All patches that are intended for the branch should first be committed
>>>   to the unstable branch, merged into the stable branch, and then into
>>>   the current release branch.
>>> * Only Jira issues with Fix version 8.11.2 and priority "Blocker" will
>>> delay
>>>   a release candidate build.
>>>
>>>
>>>
>>> Also, please observe that since 9.0 already exists, there cannot be any
>>> index format breaking changes. It really should only be bug fixes that have
>>> already been verified on the 9x branch.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Mike
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Anshum Gupta
>>>
>>
>>
>> --
>> Anshum Gupta
>>
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Bugfix release Lucene/Solr 8.11.2

2022-05-13 Thread Gus Heck
I think it would be good if we can get
https://issues.apache.org/jira/browse/SOLR-16194 into 8.11.2 I plan to work
on it this weekend. I'm hoping it will be a straightforward matter of
adding a check for existing collections.

On Fri, May 13, 2022 at 4:21 AM Anshum Gupta  wrote:

> Yes please! I assumed that was already the case as both lists are copied
> :)
>
> On Fri, May 13, 2022 at 12:47 AM Uwe Schindler  wrote:
>
>> Should we maybe also ask on the Lucene side if any backports to 8.11
>> would be good?
>>
>>
>>
>> Uwe
>>
>>
>>
>> -
>>
>> Uwe Schindler
>>
>> Achterdiek 19, D-28357 Bremen
>>
>> https://www.thetaphi.de
>>
>> eMail: u...@thetaphi.de
>>
>>
>>
>> *From:* Anshum Gupta 
>> *Sent:* Friday, May 13, 2022 1:23 AM
>> *To:* d...@solr.apache.org
>> *Cc:* Solr/Lucene Dev 
>> *Subject:* Re: Bugfix release Lucene/Solr 8.11.2
>>
>>
>>
>> Thanks for volunteering, Mike!
>>
>>
>>
>> I think the commits I was tracking to be in 8x are already there, but
>> I'll confirm this over the weekend and let you know in case I intend to
>> backport anything more.
>>
>>
>>
>> On Thu, May 12, 2022 at 4:03 PM Mike Drob  wrote:
>>
>> To: dev@lucene, dev@solr
>>
>>
>>
>> NOTICE:
>>
>>
>> I am planning on preparing a bugfix release from branch branch_8_11
>> (likely mid next week)
>>
>> Please observe the normal rules for committing to this branch:
>>
>> * Before committing to the branch, reply to this thread and argue
>>   why the fix needs backporting and how long it will take.
>>
>> ** If you're backporting stuff this week still or over the weekend, then
>> skip
>>
>> the bit about how long it will take.
>> * All issues accepted for backporting should be marked with 8.11.2
>>   in JIRA, and issues that should delay the release must be marked as
>> Blocker
>> * All patches that are intended for the branch should first be committed
>>   to the unstable branch, merged into the stable branch, and then into
>>   the current release branch.
>> * Only Jira issues with Fix version 8.11.2 and priority "Blocker" will
>> delay
>>   a release candidate build.
>>
>>
>>
>> Also, please observe that since 9.0 already exists, there cannot be any
>> index format breaking changes. It really should only be bug fixes that have
>> already been verified on the 9x branch.
>>
>>
>>
>> Thanks,
>>
>> Mike
>>
>>
>>
>>
>> --
>>
>> Anshum Gupta
>>
>
>
> --
> Anshum Gupta
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: [DISCUSS] A proposal for migration to GitHub issue (LUCENE-10557)

2022-05-11 Thread Gus Heck
jects show that it's
>>>> a valid option for our projects. I think the ultimate questions are:
>>>>
>>>>- Which will be easier for users to find relevant information?
>>>>- Which reduces the amount of bureaucracy needed to contribute to
>>>>the project?
>>>>- Which fits into the workflows of existing committers the best?
>>>>
>>>> To me Github comes up on top, even though there are things that JIRA
>>>> does better.
>>>>
>>>> P.S. I think you mean https://github.com/helm/charts, marcus. I don't
>>>> think helm is deprecated
>>>>
>>>> On Tue, May 10, 2022 at 1:41 PM Marcus Eagan 
>>>> wrote:
>>>>
>>>>> I recommend people take a look at the now deprecated helm project. It
>>>>> was very difficult to land PRs because they had so much governance and
>>>>> automation. For a data store as mature as SOLR, I would suggest it is
>>>>> needed.
>>>>>
>>>>> Many issues are worth a read: https://github.com/helm/helm
>>>>>
>>>>> On Tue, May 10, 2022 at 10:16 AM Gus Heck  wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, May 10, 2022 at 10:40 AM Houston Putman 
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>> Most modern open source projects use Github Issues for their issue
>>>>>>> tracking, so it's definitely doable, and really what new
>>>>>>> users/contributors will be expecting. Also I see that much discussion is
>>>>>>> already done on PRs, and JIRAs are mainly there just for
>>>>>>> bureaucratic purposes. So I think it would be a wonderful direction to 
>>>>>>> go
>>>>>>> in.
>>>>>>>
>>>>>>>
>>>>>> On that note, many such projects I find it more difficult to get
>>>>>> clarity on whether or not I'm affected by the issue, or in what version 
>>>>>> it
>>>>>> was resolved. Usually i can be achieved by clicking on the referenced
>>>>>> commit, and then inspecting what tags are on that commit, but it's 
>>>>>> several
>>>>>> clicks and a minute or two vs just looking at the field in Jira...
>>>>>>
>>>>>> This can be made easier by using milestones as seen here (random
>>>>>> example, used gradle because it's a very large, healthy project):
>>>>>> https://github.com/gradle/gradle/issues/20182
>>>>>>
>>>>>> But I've seen a lot of projects that don't do that... which probably
>>>>>> colors my view a bit.
>>>>>>
>>>>>> -Gus
>>>>>>
>>>>>> --
>>>>>> http://www.needhamsoftware.com (work)
>>>>>> http://www.the111shift.com (play)
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Marcus Eagan
>>>>>
>>>>>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: [DISCUSS] A proposal for migration to GitHub issue (LUCENE-10557)

2022-05-10 Thread Gus Heck
On Tue, May 10, 2022 at 10:40 AM Houston Putman  wrote:

>
>>
> Most modern open source projects use Github Issues for their issue
> tracking, so it's definitely doable, and really what new
> users/contributors will be expecting. Also I see that much discussion is
> already done on PRs, and JIRAs are mainly there just for
> bureaucratic purposes. So I think it would be a wonderful direction to go
> in.
>
>
On that note, many such projects I find it more difficult to get clarity on
whether or not I'm affected by the issue, or in what version it was
resolved. Usually i can be achieved by clicking on the referenced commit,
and then inspecting what tags are on that commit, but it's several clicks
and a minute or two vs just looking at the field in Jira...

This can be made easier by using milestones as seen here (random example,
used gradle because it's a very large, healthy project):
https://github.com/gradle/gradle/issues/20182

But I've seen a lot of projects that don't do that... which probably colors
my view a bit.

-Gus

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: [DISCUSS] A proposal for migration to GitHub issue (LUCENE-10557)

2022-05-10 Thread Gus Heck
Yes, the listing of differences (that we rely on) of course has two
resolution paths to facilitate such a move. A) find a way to fill the gap
B) decide we don't care about the gap - either is fine so long as it's an
intentional decision, not an oops we discover and regret later.

On Tue, May 10, 2022 at 10:40 AM Houston Putman  wrote:

> It's not about features, but about accepting a new framework to me. GitHub
>> issue would not be a replacement Jira, and we cannot operate this project
>> on GitHub issue in the same way on Jira. We'd need to build our new
>> convention and operations on the new toolkit.
>>
>
> I think this is a very important point. We have done a good job of using
> Github Issues 100% for the Solr Operator, but that is definitely smaller
> than Lucene and Solr. I think it's doable but we will definitely have to
> build in processes (like other large projects have done). Github Issues is
> not as powerful as JIRA, but maybe in the long-term that will be a good
> thing? Also Github has been improving over the last few years, and I have
> only seen JIRA either stay the same or get worse in the ~decade I've been
> working on the project.
>
> Most modern open source projects use Github Issues for their issue
> tracking, so it's definitely doable, and really what new
> users/contributors will be expecting. Also I see that much discussion is
> already done on PRs, and JIRAs are mainly there just for
> bureaucratic purposes. So I think it would be a wonderful direction to go
> in.
>
> - Houston
>
> On Tue, May 10, 2022 at 8:06 AM Tomoko Uchida <
> tomoko.uchida.1...@gmail.com> wrote:
>
>> Thanks Alessandro for openly sharing your perspective!
>>
>> > I have limited experience with the Github issue system, it looks
>> definitely "simpler" than Jira, not sure it covers all our requirements.
>>
>> I feel I'd need to explain my thoughts on this point. Yes, I think I know
>> very well about such kind of discussion - "We're using XXX for YYY, does
>> the new shiny ZZZ tool work as its replacement? Does ZZZ satisfy our
>> use-cases so far?"
>> But - I don't want to make this discuss thread into a feature comparison
>> of Jira vs GitHub.
>> It's not about features, but about accepting a new framework to me.
>> GitHub issue would not be a replacement Jira, and we cannot operate this
>> project on GitHub issue in the same way on Jira. We'd need to build our new
>> convention and operations on the new toolkit. I myself am optimistic about
>> we can do it well if we fully decide to accept the worldview the tool
>> provides for us.
>>
>> If many of you (here I mean, committers) feel "It's okay if our current
>> operation will be kept on GitHub.", I won't be able to fulfill your
>> expecttations.
>> It's my position - and if it's not acceptable, I'd be happy to fail this
>> proposal.
>>
>> Thanks,
>> Tomoko
>>
>>
>> 2022年5月10日(火) 18:41 Alessandro Benedetti :
>>
>>> Hi Tomoko,
>>> thanks for raising this!
>>>
>>> I am always in favor of simplicity and with the idea that code should
>>> speak for itself(readable code and meaningful commit messages over dirty
>>> code covered by a detailed Jira issue).
>>>
>>> Now, given that, I have been using Jira for many years, I agree with all
>>> the limitations mentioned so far but I am generally happy about using it.
>>> I have limited experience with the Github issue system, it looks
>>> definitely "simpler" than Jira, not sure it covers all our requirements.
>>>
>>> Being a bit provocative and thinking out loud, I see a true necessity of
>>> raising issues(Jira or Github) in these instances:
>>> 1) proposal that needs discussion and doesn't have a clear solution
>>> 2) raise a bug/task/story we are not planning to do ourselves
>>> immediately (so we want to give the community the chance of doing it while
>>> we are busy)
>>> 3) planning, using sprints etc (we don't do)
>>>
>>> Whenever we have a contribution or bugfix ready (as an output of our
>>> daily working activity), it feels to me that it's unnecessary to create an
>>> issue at all, modern pull requests are perfectly fine for adding all the
>>> necessary details, tag people for review or discuss the contribution: to me
>>> having to open any kind of issue is just an unnecessary boilerplate
>>> activity (and duplication of description, comments, etc).
>>> Pretty sure I am missing something, but I just wanted to give a quick
>>> glance of a recent feeling of mine.
>>>
>>> Long story short, if the Github issue system covers all our
>>> requirements, I think it's going to be beneficial to keep all in the "same"
>>> place and would ease contributions.
>>> But I am arguing we should not open an issue for each contribution, most
>>> of the time the Pull Request should be enough.
>>> Of course, we should estimate the effort, identify people that
>>> realistically want to work for that and then vote if the amount of
>>> dedication is worth.
>>>
>>> In regards to nationality bans, and sanctions etc, I am personally not

Re: [DISCUSS] A proposal for migration to GitHub issue (LUCENE-10557)

2022-05-09 Thread Gus Heck
I knew I had seen an apache issue tracker project...
https://bloodhound.apache.org/  which evidently descends from Trac, but it
appears to be more or less dead with no activity easily seen since 2014 :(

On Mon, May 9, 2022 at 10:27 AM Gus Heck  wrote:

> Ok my quick search led me astray I somehow thought Jackrabbit was an
> isuse tracker because I landed on that page first.. disregard that.
>
> On Mon, May 9, 2022 at 10:19 AM Gus Heck  wrote:
>
>> On the suggestion of private security only repo in another mail... that
>> seems to mean security issues can never be made public? Presently we have a
>> culture of openness where once the issue is resolved and a fix release we
>> share the discussion. I think that's good since it can then lead security
>> researchers or others to test our fix better and users can better
>> understand why we had to remove something or whatever.
>>
>> responses inline
>>
>> On Sun, May 8, 2022 at 11:51 PM Marcus Eagan 
>> wrote:
>>
>>> Many of my opinions have been expressed, and of course my (non-binding)
>>> vote for switching to GitHub issues is of little to no consequence.
>>>
>>>
>> Binding votes are not the only important votes, as Tomoko pointed out.
>>
>>
>>> I feel it would be wholly damaging to the Microsoft brand to pull the
>>> rug under the many open source projects owned by non-profits and hosted
>>> entirely on GitHub. Their leadership is trending toward the good and any
>>> absurd actions like that would have very serious ramifications for their
>>> business. I think it's a non-issue for the foreseeable future that is
>>> outweighed by the benefits of shedding Jira. Furthermore, here's a short
>>> list of tutorials
>>> <https://gist.github.com/MarcusSorealheis/c3e5055442b89fdf0d32c392e95ea314> 
>>> for
>>> migrating back to Jira in a doomsday scenario.
>>>
>>
>> I don't disagree, and I acknowledge that the recent trend is much
>> improved, but It's a lever by which an external company motivated by profit
>> can disrupt us if it happens to be in their interest. (besides profit,
>> there could be political motives etc, Imagine prominent pmc members expose
>> a flaw that really hurts them or sign some sort of open letter in favor of
>> a political candidate that explicitly wants to target them with antitrust
>> laws... not that happens anymore in the US but nevermind... ).
>>
>> I have a bias for the ASF and its projects to be self-sufficient
>> where feasible, and while loss of donations would be an issue
>> regardless, that would have to be at the ASF level and couldn't target
>> specific projects or individuals making it far less attractive. One can
>> argue that the irritations in Jira are making it infeasible, but that's my
>> bias.
>>
>>
>>>
>>>>- No way to enforce that a resolution label is applied to the issue.
>>>>
>>>> We can enforce labels. It will require some customization to some of
>>> the existing options. Here is a popular one
>>> <https://github.com/marketplace/actions/require-labels>.
>>>
>>
>> Hmm those are labels on PR's not issues. Github does not have an issue/pr
>> direct linking
>>
>> Which reminds me I don't think there's a way to link issues such as "this
>> one blocks that one" or "This one is related to that one", etc.
>>
>>
>>>
>>>
>>>>- Document with each issue the Affected version and the fixed
>>>>version.
>>>>
>>>> There are many ways to do this one. The simplest is the issue template
>>> <https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/configuring-issue-templates-for-your-repository>.
>>> There are many others, though.
>>>
>>
>> It seems that an issue template would put this information in a comment
>> where it's not filterable, and would need to be maintained.
>>
>> There does seem to be an edit history but is there a label history? In
>> Jira basically any action on the issue is auditable. Imagine someone
>> registers an account and does something malicious (say someone who didn't
>> like us went and removed labels from a ton of issues? how would we know
>> who, and what labels to put back?). Hard to imagine perhaps, but the
>> internet is large and contains a large number of weirdos...
>>
>>
>>>
>>> Jira is very robust, but it is daunting. It seems that to make this
>>> proposal

Re: [DISCUSS] A proposal for migration to GitHub issue (LUCENE-10557)

2022-05-09 Thread Gus Heck
Ok my quick search led me astray I somehow thought Jackrabbit was an
isuse tracker because I landed on that page first.. disregard that.

On Mon, May 9, 2022 at 10:19 AM Gus Heck  wrote:

> On the suggestion of private security only repo in another mail... that
> seems to mean security issues can never be made public? Presently we have a
> culture of openness where once the issue is resolved and a fix release we
> share the discussion. I think that's good since it can then lead security
> researchers or others to test our fix better and users can better
> understand why we had to remove something or whatever.
>
> responses inline
>
> On Sun, May 8, 2022 at 11:51 PM Marcus Eagan 
> wrote:
>
>> Many of my opinions have been expressed, and of course my (non-binding)
>> vote for switching to GitHub issues is of little to no consequence.
>>
>>
> Binding votes are not the only important votes, as Tomoko pointed out.
>
>
>> I feel it would be wholly damaging to the Microsoft brand to pull the rug
>> under the many open source projects owned by non-profits and hosted
>> entirely on GitHub. Their leadership is trending toward the good and any
>> absurd actions like that would have very serious ramifications for their
>> business. I think it's a non-issue for the foreseeable future that is
>> outweighed by the benefits of shedding Jira. Furthermore, here's a short
>> list of tutorials
>> <https://gist.github.com/MarcusSorealheis/c3e5055442b89fdf0d32c392e95ea314> 
>> for
>> migrating back to Jira in a doomsday scenario.
>>
>
> I don't disagree, and I acknowledge that the recent trend is much
> improved, but It's a lever by which an external company motivated by profit
> can disrupt us if it happens to be in their interest. (besides profit,
> there could be political motives etc, Imagine prominent pmc members expose
> a flaw that really hurts them or sign some sort of open letter in favor of
> a political candidate that explicitly wants to target them with antitrust
> laws... not that happens anymore in the US but nevermind... ).
>
> I have a bias for the ASF and its projects to be self-sufficient
> where feasible, and while loss of donations would be an issue
> regardless, that would have to be at the ASF level and couldn't target
> specific projects or individuals making it far less attractive. One can
> argue that the irritations in Jira are making it infeasible, but that's my
> bias.
>
>
>>
>>>- No way to enforce that a resolution label is applied to the issue.
>>>
>>> We can enforce labels. It will require some customization to some of the
>> existing options. Here is a popular one
>> <https://github.com/marketplace/actions/require-labels>.
>>
>
> Hmm those are labels on PR's not issues. Github does not have an issue/pr
> direct linking
>
> Which reminds me I don't think there's a way to link issues such as "this
> one blocks that one" or "This one is related to that one", etc.
>
>
>>
>>
>>>- Document with each issue the Affected version and the fixed
>>>version.
>>>
>>> There are many ways to do this one. The simplest is the issue template
>> <https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/configuring-issue-templates-for-your-repository>.
>> There are many others, though.
>>
>
> It seems that an issue template would put this information in a comment
> where it's not filterable, and would need to be maintained.
>
> There does seem to be an edit history but is there a label history? In
> Jira basically any action on the issue is auditable. Imagine someone
> registers an account and does something malicious (say someone who didn't
> like us went and removed labels from a ton of issues? how would we know
> who, and what labels to put back?). Hard to imagine perhaps, but the
> internet is large and contains a large number of weirdos...
>
>
>>
>> Jira is very robust, but it is daunting. It seems that to make this
>> proposal viable, a few members of the community need to commit to setting
>> up and facilitating the transition. To me, it feels like a two month
>> effort.
>>
>> Regarding .patch files, I think there are very few systems that still
>> rely on them.
>>
>
> If we as a group decide to drop support for them, that's a possible
> decision. It might need to precede the move to GitHub.
>
>
>> , I despise how annoying Jira gets and think that more developers could
>> get involved if we removed that dependency. GitHub actions give us lots of
>> customizability.
>

Re: [DISCUSS] A proposal for migration to GitHub issue (LUCENE-10557)

2022-05-09 Thread Gus Heck
On the suggestion of private security only repo in another mail... that
seems to mean security issues can never be made public? Presently we have a
culture of openness where once the issue is resolved and a fix release we
share the discussion. I think that's good since it can then lead security
researchers or others to test our fix better and users can better
understand why we had to remove something or whatever.

responses inline

On Sun, May 8, 2022 at 11:51 PM Marcus Eagan  wrote:

> Many of my opinions have been expressed, and of course my (non-binding)
> vote for switching to GitHub issues is of little to no consequence.
>
>
Binding votes are not the only important votes, as Tomoko pointed out.


> I feel it would be wholly damaging to the Microsoft brand to pull the rug
> under the many open source projects owned by non-profits and hosted
> entirely on GitHub. Their leadership is trending toward the good and any
> absurd actions like that would have very serious ramifications for their
> business. I think it's a non-issue for the foreseeable future that is
> outweighed by the benefits of shedding Jira. Furthermore, here's a short
> list of tutorials
>  
> for
> migrating back to Jira in a doomsday scenario.
>

I don't disagree, and I acknowledge that the recent trend is much improved,
but It's a lever by which an external company motivated by profit can
disrupt us if it happens to be in their interest. (besides profit, there
could be political motives etc, Imagine prominent pmc members expose a flaw
that really hurts them or sign some sort of open letter in favor of a
political candidate that explicitly wants to target them with antitrust
laws... not that happens anymore in the US but nevermind... ).

I have a bias for the ASF and its projects to be self-sufficient
where feasible, and while loss of donations would be an issue
regardless, that would have to be at the ASF level and couldn't target
specific projects or individuals making it far less attractive. One can
argue that the irritations in Jira are making it infeasible, but that's my
bias.


>
>>- No way to enforce that a resolution label is applied to the issue.
>>
>> We can enforce labels. It will require some customization to some of the
> existing options. Here is a popular one
> .
>

Hmm those are labels on PR's not issues. Github does not have an issue/pr
direct linking

Which reminds me I don't think there's a way to link issues such as "this
one blocks that one" or "This one is related to that one", etc.


>
>
>>- Document with each issue the Affected version and the fixed
>>version.
>>
>> There are many ways to do this one. The simplest is the issue template
> .
> There are many others, though.
>

It seems that an issue template would put this information in a comment
where it's not filterable, and would need to be maintained.

There does seem to be an edit history but is there a label history? In Jira
basically any action on the issue is auditable. Imagine someone registers
an account and does something malicious (say someone who didn't like us
went and removed labels from a ton of issues? how would we know who, and
what labels to put back?). Hard to imagine perhaps, but the internet is
large and contains a large number of weirdos...


>
> Jira is very robust, but it is daunting. It seems that to make this
> proposal viable, a few members of the community need to commit to setting
> up and facilitating the transition. To me, it feels like a two month
> effort.
>
> Regarding .patch files, I think there are very few systems that still rely
> on them.
>

If we as a group decide to drop support for them, that's a possible
decision. It might need to precede the move to GitHub.


> , I despise how annoying Jira gets and think that more developers could
> get involved if we removed that dependency. GitHub actions give us lots of
> customizability.
>

Oh yes. Did you note my all caps words ;) I'm in no way suggesting that
Jira is particularly friendly to use. It's particularly frustrating that
half the things I listed look like they should be relatively easy to fix
and in one case they did it to themselves for no reason I can fathom.
Context and performance really being harder (context would take some
careful design so that the users who want to work across projects still
can, and really users like me would want a 2 project context...). I suspect
however that actions can't overcome the fact that Github doesn't store
distinct fields so unless we have some way of pulling data out of issue
comments and making it searchable, under separate fields, there will be
gaps.

It's been a long time since I've tried to look around in the issue tracker
space. Are there 

Re: [DISCUSS] A proposal for migration to GitHub issue (LUCENE-10557)

2022-05-06 Thread Gus Heck
I think both tools have their merits and drawbacks

What I like about Jira:

   - It has ample room and configuration for issue metadata and
   customizable workflows and in general a deep feature set
   - It has user roles, PMC members can see security issues that are hidden
   from the world...
   - I've used it for almost 20 years so It's familiar to me.
   - It's hosted at the ASF by the ASF so nobody but the ASF can determine
   access or hold it hostage (I think, correct me if I'm wrong and we're now
   using atlassian cloud versions).

What Ii do not like about Jira

   - They have had LONG standing issues with text and visual mode not
   round-tripping (switching between them alters the text and often destroys
   formatting) which is something even cheap blogging software usually gets
   right this is EXTREMELY FRUSTRATING at times where the proper name of
   something in code includes an underscore. Especially bad is the fact that
   the do provide a way to escape things like underscores but the transition
   between visual and text destroys that escaping, making it useless, and if
   you carefully set up the escapes in text mode, one small edit by someone
   else (perhaps fixing a typo) in visual mode destroys all your hard work!
   G... And of course text mode is sometimes hard to predict so working in
   text mode with any non-trivial formatting that you may wind up re-editing
   several times which at apache sends multiple emails to the list...
   usability nightmare.
   - They switched to a default search result layout that wasn't a sortable
   table/list. This irritates me because I never want to randomly fill 70% of
   the screen with the top hit on a search and have almost no info about all
   the other results. Typically I want to immediately sort by issue number to
   find recent issues (or older issues depending). Even if text relevancy is
   the important thing in my search, assuming the top hit is what I want is
   poor.
   - By default searches typed in the easily accessed search box run
   against every project so then the very first thing I have to do is re-run
   with a project filter. Maintaining a project context would be very helpful.
   - UI can become cumbersome for filtering on issue fields with many
   values (typeahead search presumes you know the name to start with).
   - Sometimes slow.

What I like about Github

   - It fixes the first and third issue I don't like about jira (edit round
   trip and typically has a project context).
   - UI updates without explicit refresh
   - Generally nice look and feel
   - Integration of github actions, pull requests, review, and code
   repository is excellent.
   - Closing issues via commit message is nice for small projects...

What I don't like about github.

   - Very limited and no custom issue metadata
   - arbitrary file attachments are not supported. Notably .patch is not
   included in their list (GIF, JPEG, JPG, MOV, MP4, PNG, SVG, CSV, DOCX,
   FODG, FODP, FODS, FODT, GZ, LOG, MD, ODF, ODG, ODP, ODS, ODT, PDF, PPTX,
   TXT, XLS, XLSX or ZIP).
   - Search interface for issues is the equivalent of the Jira JQL search
   line, and requires learning their syntax for anything but the most basic,
   and is:issue must be retained (and wastes space in the small text field) or
   it suddenly starts finding non-issues items.
   - No concept of workflow (without add/on or plugins).
   - Closing issues via commit message is not good where you would want to
   ensure review or have any sort of workflow
   - It's owned by Microsoft, which while MUCH improved in recent years has
   a horrible dark, evil past WRT open source and standards. Above allegations
   regarding political banning of individuals is also very troubling and
   unfortunately, increasingly relevant in the current global political
   landscape.

>From the lucene perspective I see some things we have now in Jira that I
don't see a way to maintain in github

   - Security issues that are visible to PMC only (more common for solr,
   but perhaps needed for lucene sometimes as well)
   - Patch review based on an attached patch.
   - Accepting patch files instead of pull requests...
   - Contribution by folks who for some reason cannot use github (either
   blocked by work or github politics or, unwilling to accept githubs
   terms/privacy etc.)
   - Document with each issue the Affected version and the fixed version.
   - While one can create arbitrary labels, they are not segregated into
   fields so we would have to put up with what is effectively a single field
   for priority, component, and resolution
   - No way to enforce that a resolution label is applied to the issue.

I feel that Github issues are simply lacking in depth and riding along on
the virtue of their integrations. I feel like their issue tracking
implementation is a lower priority sideline to their code repository (so
they can say they have it).  On the flip side Jira has become hugely

Re: Welcome Guo Feng as Lucene committer

2022-01-25 Thread Gus Heck
Welcome!

On Tue, Jan 25, 2022 at 9:57 AM Michael McCandless <
luc...@mikemccandless.com> wrote:

> Welcome Feng!
>
> Mike
>
> On Tue, Jan 25, 2022 at 4:09 AM Adrien Grand  wrote:
>
>> I'm pleased to announce that Guo Feng has accepted the PMC's
>> invitation to become a committer.
>>
>> Feng, the tradition is that new committers introduce themselves with a
>> brief bio.
>>
>> Congratulations and welcome!
>>
>> --
>> Adrien
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>> --
> Mike McCandless
>
> http://blog.mikemccandless.com
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Welcome Haoyu (Patrick) Zhai as Lucene Committer

2021-12-19 Thread Gus Heck
Welcome :)

On Sun, Dec 19, 2021 at 9:48 AM Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:

> Welcome Haoyu!
>
> On Sun, 19 Dec, 2021, 7:16 pm Michael McCandless, <
> luc...@mikemccandless.com> wrote:
>
>> Welcome Patrick!
>>
>> Mike
>>
>> On Sun, Dec 19, 2021 at 8:44 AM Robert Muir  wrote:
>>
>>> Congratulations!
>>>
>>> On Sun, Dec 19, 2021 at 4:12 AM Dawid Weiss 
>>> wrote:
>>> >
>>> > Hello everyone!
>>> >
>>> > Please welcome Haoyu Zhai as the latest Lucene committer. You may also
>>> > know Haoyu as Patrick - this is perhaps his kind gesture to those of
>>> > us whose tongues are less flexible in pronouncing difficult first
>>> > names. :)
>>> >
>>> > It's a tradition to briefly introduce yourself to the group, Patrick.
>>> > Welcome and thank you!
>>> >
>>> > Dawid
>>> >
>>> > -
>>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> > For additional commands, e-mail: dev-h...@lucene.apache.org
>>> >
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>> --
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Log4j < 2.15.0 may still be vulnerable even if -Dlog4j2.formatMsgNoLookups=true is set

2021-12-18 Thread Gus Heck
Thinking about it some more, maybe the problem with my suggestion is
the table on that page is organized by the library version and, if
unmitigated, the version of the library is still a problem. Maybe another
way to be clearer about it and avoid rewriting things that people have
already read would be to add independent entries to the security news page
for the newer CVE's

On Sat, Dec 18, 2021 at 12:20 PM Gus Heck  wrote:

> I think perhaps in the shock of such a deep and surprising vulnerability
> with such high visibility, we've begun to break with how we normally handle
> CVE's that don't apply to our usage of the library. Previously, they just
> got added to the list of known false positives
> <https://cwiki.apache.org/confluence/display/SOLR/SolrSecurity#SolrSecurity-SolrandVulnerabilityScanningTools>.
> Normally we wouldn't even mention them on the security news page, but
> because of the high visibility we should simply have a line mentioning that
> these two CVE's are on our false positives page and explain details there.
> The wiki would provide revision history automatically.
>
> On Sat, Dec 18, 2021 at 11:25 AM Jan Høydahl 
> wrote:
>
>> We make edits to the log4j advisory almost daily, see
>> https://github.com/apache/solr-site/commits/e10a6a9fe0eed8dcba3ad1a076c8208e014e76ff/content/solr/security/2021-12-10-cve-2021-44228.md
>> I wonder if we should include a "Revision history" paragraph in the
>> advisory for transparency?
>>
>> Jan
>>
>> 15. des. 2021 kl. 19:09 skrev Uwe Schindler :
>>
>> Hi all, I prepared a PR about the followup CVE-2021-45046:
>> https://github.com/apache/solr-site/pull/59
>>
>> Please verify and make suggestion. I will merge this into main/production
>> later.
>>
>> Uwe
>>
>> -
>> Uwe Schindler
>> Achterdiek 19, D-28357 Bremen
>> https://www.thetaphi.de
>> eMail: u...@thetaphi.de
>>
>> *From:* Uwe Schindler 
>> *Sent:* Wednesday, December 15, 2021 3:31 PM
>> *To:* 'dev@lucene.apache.org' 
>> *Subject:* RE: Log4j < 2.15.0 may still be vulnerable even if
>> -Dlog4j2.formatMsgNoLookups=true is set
>>
>> We should add this to the webpage. Another one asked on the security
>> mailing list.
>>
>> Uwe
>>
>> -
>> Uwe Schindler
>> Achterdiek 19, D-28357 Bremen
>> https://www.thetaphi.de
>> eMail: u...@thetaphi.de
>>
>> *From:* Gus Heck 
>> *Sent:* Wednesday, December 15, 2021 12:39 AM
>> *To:* dev 
>> *Subject:* Re: Log4j < 2.15.0 may still be vulnerable even if
>> -Dlog4j2.formatMsgNoLookups=true is set
>>
>> Perhaps we could tweak it to say that the system property fix is
>> sufficient *for Solr* (i.e. not imply that it is a valid work around for
>> all cases)
>>
>> On Tue, Dec 14, 2021 at 6:20 PM Uwe Schindler  wrote:
>>
>> The other attack vectors are also not possible with Solr:
>>
>> - Logger.printf("%s", userInput) is not used
>> - custom message factory is not used
>>
>> Uwe
>> Am 14. Dezember 2021 22:59:26 UTC schrieb Uwe Schindler > >:
>>
>> It is still a valid mitigation.
>>
>> Mike Drobban I explained it. MDC is the other attack vector and that's
>> not an issue with Solr.
>>
>> Please accept this, just because the documentation of log4j changes,
>> there's no additional risk. We may update the mitigation to mention that in
>> Solr's case the system property is fine.
>>
>> Uwe
>> Am 14. Dezember 2021 22:52:29 UTC schrieb solr :
>>
>> Ok.
>>
>> But FTR - apache/log4j has discredited just setting the system property as a 
>> mitigation measure, so I still think the SOLR security-page should be 
>> changed to not list this as a valid mitigation:
>>
>> https://logging.apache.org/log4j/2.x/security.html
>> "Older (discredited) mitigation measures
>>
>> This page previously mentioned other mitigation measures, but we discovered 
>> that these measures only limit exposure while leaving some attack vectors 
>> open.
>>
>> Other insufficient mitigation measures are: setting system property 
>> log4j2.formatMsgNoLookups or environment variable 
>> LOG4J_FORMAT_MSG_NO_LOOKUPS to true for releases >= 2.10, or modifying the 
>> logging configuration to disable message lookups with %m{nolookups}, 
>> %msg{nolookups} or %message{nolookups} for releases >= 2.7 and <= 2.14.1.
>> “
>>
>> Regards,
>>
>>
>> Fredrik
>>
>>
>> --
>> Fredrik Rødland   Cell:+47 99 21 98 17
>> 

Re: Log4j < 2.15.0 may still be vulnerable even if -Dlog4j2.formatMsgNoLookups=true is set

2021-12-18 Thread Gus Heck
I think perhaps in the shock of such a deep and surprising vulnerability
with such high visibility, we've begun to break with how we normally handle
CVE's that don't apply to our usage of the library. Previously, they just
got added to the list of known false positives
<https://cwiki.apache.org/confluence/display/SOLR/SolrSecurity#SolrSecurity-SolrandVulnerabilityScanningTools>.
Normally we wouldn't even mention them on the security news page, but
because of the high visibility we should simply have a line mentioning that
these two CVE's are on our false positives page and explain details there.
The wiki would provide revision history automatically.

On Sat, Dec 18, 2021 at 11:25 AM Jan Høydahl  wrote:

> We make edits to the log4j advisory almost daily, see
> https://github.com/apache/solr-site/commits/e10a6a9fe0eed8dcba3ad1a076c8208e014e76ff/content/solr/security/2021-12-10-cve-2021-44228.md
> I wonder if we should include a "Revision history" paragraph in the
> advisory for transparency?
>
> Jan
>
> 15. des. 2021 kl. 19:09 skrev Uwe Schindler :
>
> Hi all, I prepared a PR about the followup CVE-2021-45046:
> https://github.com/apache/solr-site/pull/59
>
> Please verify and make suggestion. I will merge this into main/production
> later.
>
> Uwe
>
> -
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> *From:* Uwe Schindler 
> *Sent:* Wednesday, December 15, 2021 3:31 PM
> *To:* 'dev@lucene.apache.org' 
> *Subject:* RE: Log4j < 2.15.0 may still be vulnerable even if
> -Dlog4j2.formatMsgNoLookups=true is set
>
> We should add this to the webpage. Another one asked on the security
> mailing list.
>
> Uwe
>
> -----
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> *From:* Gus Heck 
> *Sent:* Wednesday, December 15, 2021 12:39 AM
> *To:* dev 
> *Subject:* Re: Log4j < 2.15.0 may still be vulnerable even if
> -Dlog4j2.formatMsgNoLookups=true is set
>
> Perhaps we could tweak it to say that the system property fix is
> sufficient *for Solr* (i.e. not imply that it is a valid work around for
> all cases)
>
> On Tue, Dec 14, 2021 at 6:20 PM Uwe Schindler  wrote:
>
> The other attack vectors are also not possible with Solr:
>
> - Logger.printf("%s", userInput) is not used
> - custom message factory is not used
>
> Uwe
> Am 14. Dezember 2021 22:59:26 UTC schrieb Uwe Schindler :
>
> It is still a valid mitigation.
>
> Mike Drobban I explained it. MDC is the other attack vector and that's not
> an issue with Solr.
>
> Please accept this, just because the documentation of log4j changes,
> there's no additional risk. We may update the mitigation to mention that in
> Solr's case the system property is fine.
>
> Uwe
> Am 14. Dezember 2021 22:52:29 UTC schrieb solr :
>
> Ok.
>
> But FTR - apache/log4j has discredited just setting the system property as a 
> mitigation measure, so I still think the SOLR security-page should be changed 
> to not list this as a valid mitigation:
>
> https://logging.apache.org/log4j/2.x/security.html
> "Older (discredited) mitigation measures
>
> This page previously mentioned other mitigation measures, but we discovered 
> that these measures only limit exposure while leaving some attack vectors 
> open.
>
> Other insufficient mitigation measures are: setting system property 
> log4j2.formatMsgNoLookups or environment variable LOG4J_FORMAT_MSG_NO_LOOKUPS 
> to true for releases >= 2.10, or modifying the logging configuration to 
> disable message lookups with %m{nolookups}, %msg{nolookups} or 
> %message{nolookups} for releases >= 2.7 and <= 2.14.1.
> “
>
> Regards,
>
>
> Fredrik
>
>
> --
> Fredrik Rødland   Cell:+47 99 21 98 17
> Maisen Pedersens vei 1Twitter: @fredrikr
> NO-1363 Høvik, NORWAY flickr:  http://www.flickr.com/fmmr/
> http://rodland.no about.me http://about.me/fmr
>
> On 14 Dec 2021, at 23:44, Mike Drob  wrote:
>
> The MDC Patterns used by solr are for the collection, shard, replica, core 
> and node names, and a potential trace id. All of those are restricted to 
> alphanumeric, no special characters like $ or { needed for the injection. And 
> trying to access a collection that didn’t exist Returns 404 without logging.
>
> Upgrading is always going to be more complete, but I think we’re still ok for 
> now, at least until the next iteration of this attack surfaces.
>
>
>
> On Tue, Dec 14, 2021 at 3:37 PM solr  wrote:
> Only setting -Dlog4j2.formatMsgNoLookups=true might not be enough to mitigate 
> the log4j vulnerabilit

Re: [VOTE] Release Lucene/Solr 8.11.1 RC1

2021-12-15 Thread Gus Heck
fast track please :)

On Wed, Dec 15, 2021 at 7:23 PM Anshum Gupta  wrote:

> Fast-track please :)
>
> On Wed, Dec 15, 2021 at 4:19 PM Jan Høydahl  wrote:
>
>> Given the votes so far (11 binding +1) I'm also positive to publish
>> tomorrow, and not wait for Friday.
>> The release voting rules are three or more +1 votes and more +1 votes
>> than -1 votes, so for the vote to fail we'd need more than 11 -1's from now
>> :)
>>
>> If I see at least 3 more of you in favor (reply with "FAST-TRACK PLEASE")
>> and no justified vetoes, then I can make it happen on Thursday afternoon
>> UTC!
>>
>> Jan
>>
>> 15. des. 2021 kl. 22:57 skrev Ishan Chattopadhyaya <
>> ichattopadhy...@gmail.com>:
>>
>> I think we should publish, release and announce asap, not waiting for 72h
>> or the MVN propogation.
>>
>> On Thu, 16 Dec, 2021, 2:40 am Anshum Gupta, 
>> wrote:
>>
>>> +1 (binding)
>>>
>>> Smoke tester is happy.
>>>
>>> SUCCESS! [1:03:13.162577]
>>>
>>> Also tested out a sample search/indexing app.
>>>
>>> On Tue, Dec 14, 2021 at 6:36 AM Jan Høydahl 
>>> wrote:
>>>
 Please vote for release candidate 1 for Lucene/Solr 8.11.1

 The artifacts can be downloaded from:

 https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.11.1-RC1-rev0b002b11819df70783e83ef36b42ed1223c14b50

 You can run the smoke tester directly (from a fresh branch_8_11
 checkout), with this command:

 python3 -u dev-tools/scripts/smokeTestRelease.py \

 https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.11.1-RC1-rev0b002b11819df70783e83ef36b42ed1223c14b50

 The vote will be open for at least 72 hours i.e. until 2021-12-17 15:00
 UTC.

 [ ] +1  approve
 [ ] +0  no opinion
 [ ] -1  disapprove (and reason why)

 Here is my +1

 SUCCESS! [0:54:56.979538]

 NOTE: You must run the smoke tester from latest commit on branch_8_11,
 since my surname contains a unicode-character, needing a fix in the gpg
 command ran by the smoketester.
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


>>>
>>> --
>>> Anshum Gupta
>>>
>>
>>
>
> --
> Anshum Gupta
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: [VOTE] Release Lucene/Solr 8.11.1 RC1

2021-12-15 Thread Gus Heck
+1 (binding)

smoke tester pass, local 4 node cluster started via cloud.sh (-r to build
from local check out of 0b002b11819df70783e83ef36b42ed1223c14b50) created 2
collections added one doc to each, queried each and both via an alias.

On Wed, Dec 15, 2021 at 11:17 AM Jan Høydahl  wrote:

> I think ASF allows exception to the 72h voting rule for urgent fixes. The
> current vote result is 7 "+1" and no "-1". So if we figure out how to
> trigger that exception we could push it e.g. tomorrow instad of Friday?
>
> Jan
>
> > 15. des. 2021 kl. 15:29 skrev Uwe Schindler :
> >
> > Hi,
> >
> > Policeman Jenkins tested the relaese with Smoketester:
> >
> > SUCCESS! [1:28:23.237262]
> > Finished: SUCCESS
> >
> >
> https://jenkins.thetaphi.de/job/Lucene-Solr-8.x-Release-Tester/38/console
> >
> > I did not do futher checks, I just want to get the release out soon!
> Thanks
> > to Jan to do the release so fast.
> >
> > In the release notes of Lucene we should just mention that log4j was
> updated
> > (Luke and possibly Replicator). A changes entry was forgotten, but that's
> > not urgent.
> >
> > So here's my +1
> > Uwe
> >
> > -
> > Uwe Schindler
> > Achterdiek 19, D-28357 Bremen
> > https://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> >> -Original Message-
> >> From: Jan Høydahl 
> >> Sent: Tuesday, December 14, 2021 3:36 PM
> >> To: Lucene Dev 
> >> Subject: [VOTE] Release Lucene/Solr 8.11.1 RC1
> >>
> >> Please vote for release candidate 1 for Lucene/Solr 8.11.1
> >>
> >> The artifacts can be downloaded from:
> >> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.11.1-RC1-
> >> rev0b002b11819df70783e83ef36b42ed1223c14b50
> >>
> >> You can run the smoke tester directly (from a fresh branch_8_11
> checkout),
> >> with this command:
> >>
> >> python3 -u dev-tools/scripts/smokeTestRelease.py \
> >> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.11.1-RC1-
> >> rev0b002b11819df70783e83ef36b42ed1223c14b50
> >>
> >> The vote will be open for at least 72 hours i.e. until 2021-12-17 15:00
> > UTC.
> >>
> >> [ ] +1  approve
> >> [ ] +0  no opinion
> >> [ ] -1  disapprove (and reason why)
> >>
> >> Here is my +1
> >>
> >> SUCCESS! [0:54:56.979538]
> >>
> >> NOTE: You must run the smoke tester from latest commit on branch_8_11,
> >> since my surname contains a unicode-character, needing a fix in the gpg
> >> command ran by the smoketester.
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Log4j < 2.15.0 may still be vulnerable even if -Dlog4j2.formatMsgNoLookups=true is set

2021-12-14 Thread Gus Heck
Perhaps we could tweak it to say that the system property fix is sufficient
*for Solr* (i.e. not imply that it is a valid work around for all cases)

On Tue, Dec 14, 2021 at 6:20 PM Uwe Schindler  wrote:

> The other attack vectors are also not possible with Solr:
>
> - Logger.printf("%s", userInput) is not used
> - custom message factory is not used
>
> Uwe
>
> Am 14. Dezember 2021 22:59:26 UTC schrieb Uwe Schindler :
>>
>> It is still a valid mitigation.
>>
>> Mike Drobban I explained it. MDC is the other attack vector and that's
>> not an issue with Solr.
>>
>> Please accept this, just because the documentation of log4j changes,
>> there's no additional risk. We may update the mitigation to mention that in
>> Solr's case the system property is fine.
>>
>> Uwe
>>
>> Am 14. Dezember 2021 22:52:29 UTC schrieb solr :
>>>
>>> Ok.
>>>
>>> But FTR - apache/log4j has discredited just setting the system property as 
>>> a mitigation measure, so I still think the SOLR security-page should be 
>>> changed to not list this as a valid mitigation:
>>>
>>> https://logging.apache.org/log4j/2.x/security.html
>>> "Older (discredited) mitigation measures
>>>
>>> This page previously mentioned other mitigation measures, but we discovered 
>>> that these measures only limit exposure while leaving some attack vectors 
>>> open.
>>>
>>> Other insufficient mitigation measures are: setting system property 
>>> log4j2.formatMsgNoLookups or environment variable 
>>> LOG4J_FORMAT_MSG_NO_LOOKUPS to true for releases >= 2.10, or modifying the 
>>> logging configuration to disable message lookups with %m{nolookups}, 
>>> %msg{nolookups} or %message{nolookups} for releases >= 2.7 and <= 2.14.1.
>>> “
>>>
>>> Regards,
>>>
>>>
>>> Fredrik
>>>
>>>
>>> --
>>> Fredrik Rødland   Cell:+47 99 21 98 17
>>> Maisen Pedersens vei 1Twitter: @fredrikr
>>> NO-1363 Høvik, NORWAY flickr:  http://www.flickr.com/fmmr/
>>> http://rodland.no about.me http://about.me/fmr
>>>
>>>
>>>
>>> On 14 Dec 2021, at 23:44, Mike Drob  wrote:

 The MDC Patterns used by solr are for the collection, shard, replica, core 
 and node names, and a potential trace id. All of those are restricted to 
 alphanumeric, no special characters like $ or { needed for the injection. 
 And trying to access a collection that didn’t exist Returns 404 without 
 logging.

 Upgrading is always going to be more complete, but I think we’re still ok 
 for now, at least until the next iteration of this attack surfaces.



 On Tue, Dec 14, 2021 at 3:37 PM solr  wrote:
 Only setting -Dlog4j2.formatMsgNoLookups=true might not be enough to 
 mitigate the log4j vulnerability.

 See https://github.com/kmindi/log4shell-vulnerable-app
 “So even with LOG4J_FORMAT_MSG_NO_LOOKUPS true version 2.14.1 of log4j is 
 vulnerable when using ThreadContextMap in PatternLayout.”

 ThreadContext.put(key, value) is used under the hood by MDC.  I’m not sure 
 wether any user-input is actually stored in MDC in SOLR.


 Probably this should be updated: 
 https://solr.apache.org/security.html#apache-solr-affected-by-apache-log4j-cve-2021-44228

 And maybe consider releasing patch releases for other versions than 8.11 
 as well which includes log4j 2.16.0?



 Regards,


 Fredrik


 --
 Fredrik Rødland   Cell:+47 99 21 98 17
 Maisen Pedersens vei 1Twitter: @fredrikr
 NO-1363 Høvik, NORWAY flickr:  http://www.flickr.com/fmmr/
 http://rodland.no about.me http://about.me/fmr
 --
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

 --
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>> --
>> Uwe Schindler
>> Achterdiek 19, 28357 Bremen
>> https://www.thetaphi.de
>>
> --
> Uwe Schindler
> Achterdiek 19, 28357 Bremen
> https://www.thetaphi.de
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Maven publication with the Gradle build

2021-12-06 Thread Gus Heck
Hmm do the RC builds have an RC1,RC2,etc notation in the version so one can
tell which one is running? (and if you've really restarted to the intended
version etc when testing/comparing them)? if so we'd be rebuilding anyway
to get rid of it?

On Mon, Dec 6, 2021 at 10:41 AM Dawid Weiss  wrote:

> > I believe maven/nexus natively allows publishing an RC to the staging
> repo and then manually promoting to a release. Apache infra has this dual
> set up for us already.
> > Would gradle be able to hook into that process?
>
> If it's the same as other open source Nexus installations then the
> "staging repository" is an intermediate step that's always there -
> there is no way to publish directly to releases. The question is how
> do you upload to the staging repository - it's one of the methods I
> mentioned in my previous e-mail.
>
> D.
>
> >
> > On Mon, Dec 6, 2021 at 9:15 AM Dawid Weiss 
> wrote:
> >>
> >> Hi Adrien,
> >>
> >> So. From gradle's point of view I don't think it's possible to reuse
> >> exactly the same files as were present in the RC candidate.
> >> Technically artifacts are a result of the build (tasks) - they can't
> >> be taken arbitrarily from disk. Or rather: it may be possible but will
> >> require terrible hacks.
> >>
> >> I suggested to Jan that we could instead package the artifacts as
> >> Sonatype Nexus's "distribution bundle" - a ZIP file with all the
> >> information (poms, checksums, jars) that should be staged. This
> >> requires a manual upload of this ZIP file but otherwise allows
> >> publishing exact identical files that were part of the RC.
> >>
> >> If we only care about releasing from the same git hash (but with
> >> rebuilt artifacts - so some things in manifests may change!) then
> >> indeed it's doable via the command Jan mentioned.
> >>
> >> I don't think there exists a third way of doing this (?).
> >>
> >> I can help add a task that will assemble a Sonatype Nexus bundle which
> >> you'll be able to download to Nexus. It should ultimately be part of
> >> the release candidate artifacts - this would make it explicit what
> >> will get uploaded.
> >>
> >> Dawid
> >>
> >> On Mon, Dec 6, 2021 at 2:19 PM Adrien Grand  wrote:
> >> >
> >> > Hello,
> >> >
> >> > The release wizard still suggests using Ant for Maven publication:
> >> >
> >> >   cd ~/.lucene-releases/9.0.0/lucene
> >> >   ant clean stage-maven-artifacts \
> >> >
>  
> -Dmaven.dist.dir=~/.lucene-releases/9.0.0/RC4/dist/lucene-9.0.0-RC4-rev-0b18b3b965cedaf5eb129aa41243a44c83ca826d/lucene/maven
> >> > \
> >> >   -Dm2.repository.id=apache.releases.https \
> >> >   -Dm2.repository.url=
> https://repository.apache.org/service/local/staging/deploy/maven2
> >> >
> >> > The Gradle build has a `mavenToApacheReleases` task that seems to do
> >> > what I want, but I can't find how to tell it to use the JARs of RC4
> >> > rather than those produced by `gradlew assembleRelease`. Can someone
> >> > help me with this?
> >> >
> >> > --
> >> > Adrien
> >> >
> >> > -
> >> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >> >
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Welcome Julie Tibshirani to the Lucene PMC

2021-11-30 Thread Gus Heck
Welcome :)

On Tue, Nov 30, 2021 at 5:45 PM Michael Sokolov  wrote:

> yup I checked and you are there:
> https://whimsy.apache.org/roster/committee/lucene -- just curious,
> does anyone know why some of our names are **bold** on that list?
>
> On Tue, Nov 30, 2021 at 5:19 PM Michael Sokolov 
> wrote:
> >
> > Welcome, Julie!
> >
> >  I think Adrien already added you to the PMC LDAP group, but I'll
> double-check
> >
> > On Tue, Nov 30, 2021, 2:11 PM Anshum Gupta 
> wrote:
> >>
> >> Congratulations and welcome, Julie!
> >>
> >> On Tue, Nov 30, 2021 at 1:49 PM Adrien Grand  wrote:
> >>>
> >>> I'm pleased to announce that Julie Tibshirani has accepted an
> invitation to join the Lucene PMC!
> >>>
> >>> Congratulations Julie, and welcome aboard!
> >>>
> >>> --
> >>> Adrien
> >>
> >>
> >>
> >> --
> >> Anshum Gupta
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: [VOTE] Release Lucene 9.0.0 RC3

2021-11-29 Thread Gus Heck
Re-reading I think I misunderstood. I thought somewhere it was said that
git repo was needed to build which I would consider important, but I guess
that was just re-building the src tarball itself that required it.

If the existing src tarball is sufficient to produce a working server based
solely on instructions in the README.md I have no further concerns.

On Mon, Nov 29, 2021 at 11:03 AM Dawid Weiss  wrote:

> The basic build steps are included in the readme, Gus -
> https://github.com/apache/lucene/#building-with-gradle
>
> Is your comment about moving it to a separate file or about the
> instructions to build the package in general? If it's the latter then
> I think it's fine?
>
> Dawid
>
> On Mon, Nov 29, 2021 at 2:16 PM Gus Heck  wrote:
> >
> > Not suggesting it's a show stopper, just that this would be an easily
> found and consumed way to document the information revealed in this
> discussion.
> >
> > On Mon, Nov 29, 2021 at 8:13 AM Uwe Schindler  wrote:
> >>
> >> Hi,
> >>
> >> The same applies for the historical time. It was never possible to do a
> full release with only the source tarball.
> >>
> >> The only thing that does not work is: assembleSourceRelease, because it
> requires "git archive" to build the src.tgz. But why should anybody do
> this? You have a src.tgz already why create another one from itsself?
> >>
> >> You can create a binary release without problems (at least that worked
> yesterday), it will just not have a git hash in the metadata of JAR files
> (and so on). But the version number is always "SNAPSHOT" unless you define
> your own (we do this to prevent "unauthorized artifacts created
> accidentally). If somebody wants to release a custom Lucene with patches,
> one can pass -Dversion.suffix="Ubuntu20.04-foobar" to make a customized
> release.
> >>
> >> Uwe
> >>
> >> -
> >> Uwe Schindler
> >> Achterdiek 19, D-28357 Bremen
> >> https://www.thetaphi.de
> >> eMail: u...@thetaphi.de
> >>
> >> > -Original Message-
> >> > From: Dawid Weiss 
> >> > Sent: Monday, November 29, 2021 1:33 PM
> >> > To: Lucene Dev 
> >> > Subject: Re: [VOTE] Release Lucene 9.0.0 RC3
> >> >
> >> > I don't think it's a showstopper. This applies to any 9x branch -
> >> > perhaps starting from main. We can extract these instructions into a
> >> > separate document. On the other hand, it wouldn't be shown up there on
> >> > github front-page then... and times have changed - this is where most
> >> > folks would probably end up reading the instructions, not the source
> >> > bundle?
> >> >
> >> > D.
> >> >
> >> > On Mon, Nov 29, 2021 at 1:01 PM Gus Heck  wrote:
> >> > >
> >> > > Seems to me the details for how to turn a src tarball into
> something that can
> >> > be compiled should go in a BUILDING.txt file?
> >> > >
> >> > > On Mon, Nov 29, 2021 at 3:00 AM Dawid Weiss 
> >> > wrote:
> >> > >>
> >> > >> SUCCESS! [0:17:23.949074]
> >> > >>
> >> > >> +1.
> >> > >>
> >> > >> D.
> >> > >>
> >> > >> On Fri, Nov 26, 2021 at 3:31 PM Adrien Grand 
> wrote:
> >> > >> >
> >> > >> > Please vote for release candidate 3 for Lucene 9.0.0.
> >> > >> >
> >> > >> > The artifacts can be downloaded from:
> >> > >> >
> https://dist.apache.org/repos/dist/dev/lucene/lucene-9.0.0-RC3-rev-
> >> > 1ddce848cf3d5067efcafc6569d5f8203e56af0b
> >> > >> >
> >> > >> > You can run the smoke tester directly with this command:
> >> > >> >
> >> > >> > python3 -u dev-tools/scripts/smokeTestRelease.py \
> >> > >> >
> https://dist.apache.org/repos/dist/dev/lucene/lucene-9.0.0-RC3-rev-
> >> > 1ddce848cf3d5067efcafc6569d5f8203e56af0b
> >> > >> >
> >> > >> > The vote will be open until 2021-11-30 9:00 UTC.
> >> > >> >
> >> > >> > [ ] +1  approve
> >> > >> > [ ] +0  no opinion
> >> > >> > [ ] -1  disapprove (and reason why)
> >> > >> >
> >> > >> > Here is my +1.
> >> > >> >
> >> > >> > --
> 

Re: [VOTE] Release Lucene 9.0.0 RC3

2021-11-29 Thread Gus Heck
Not suggesting it's a show stopper, just that this would be an easily found
and consumed way to document the information revealed in this discussion.

On Mon, Nov 29, 2021 at 8:13 AM Uwe Schindler  wrote:

> Hi,
>
> The same applies for the historical time. It was never possible to do a
> full release with only the source tarball.
>
> The only thing that does not work is: assembleSourceRelease, because it
> requires "git archive" to build the src.tgz. But why should anybody do
> this? You have a src.tgz already why create another one from itsself?
>
> You can create a binary release without problems (at least that worked
> yesterday), it will just not have a git hash in the metadata of JAR files
> (and so on). But the version number is always "SNAPSHOT" unless you define
> your own (we do this to prevent "unauthorized artifacts created
> accidentally). If somebody wants to release a custom Lucene with patches,
> one can pass -Dversion.suffix="Ubuntu20.04-foobar" to make a customized
> release.
>
> Uwe
>
> -
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: Dawid Weiss 
> > Sent: Monday, November 29, 2021 1:33 PM
> > To: Lucene Dev 
> > Subject: Re: [VOTE] Release Lucene 9.0.0 RC3
> >
> > I don't think it's a showstopper. This applies to any 9x branch -
> > perhaps starting from main. We can extract these instructions into a
> > separate document. On the other hand, it wouldn't be shown up there on
> > github front-page then... and times have changed - this is where most
> > folks would probably end up reading the instructions, not the source
> > bundle?
> >
> > D.
> >
> > On Mon, Nov 29, 2021 at 1:01 PM Gus Heck  wrote:
> > >
> > > Seems to me the details for how to turn a src tarball into something
> that can
> > be compiled should go in a BUILDING.txt file?
> > >
> > > On Mon, Nov 29, 2021 at 3:00 AM Dawid Weiss 
> > wrote:
> > >>
> > >> SUCCESS! [0:17:23.949074]
> > >>
> > >> +1.
> > >>
> > >> D.
> > >>
> > >> On Fri, Nov 26, 2021 at 3:31 PM Adrien Grand 
> wrote:
> > >> >
> > >> > Please vote for release candidate 3 for Lucene 9.0.0.
> > >> >
> > >> > The artifacts can be downloaded from:
> > >> > https://dist.apache.org/repos/dist/dev/lucene/lucene-9.0.0-RC3-rev-
> > 1ddce848cf3d5067efcafc6569d5f8203e56af0b
> > >> >
> > >> > You can run the smoke tester directly with this command:
> > >> >
> > >> > python3 -u dev-tools/scripts/smokeTestRelease.py \
> > >> > https://dist.apache.org/repos/dist/dev/lucene/lucene-9.0.0-RC3-rev-
> > 1ddce848cf3d5067efcafc6569d5f8203e56af0b
> > >> >
> > >> > The vote will be open until 2021-11-30 9:00 UTC.
> > >> >
> > >> > [ ] +1  approve
> > >> > [ ] +0  no opinion
> > >> > [ ] -1  disapprove (and reason why)
> > >> >
> > >> > Here is my +1.
> > >> >
> > >> > --
> > >> > Adrien
> > >> >
> > >> >
> -
> > >> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > >> > For additional commands, e-mail: dev-h...@lucene.apache.org
> > >> >
> > >>
> > >> -
> > >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > >> For additional commands, e-mail: dev-h...@lucene.apache.org
> > >>
> > >
> > >
> > > --
> > > http://www.needhamsoftware.com (work)
> > > http://www.the111shift.com (play)
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: [VOTE] Release Lucene 9.0.0 RC3

2021-11-29 Thread Gus Heck
Seems to me the details for how to turn a src tarball into something that
can be compiled should go in a BUILDING.txt file?

On Mon, Nov 29, 2021 at 3:00 AM Dawid Weiss  wrote:

> SUCCESS! [0:17:23.949074]
>
> +1.
>
> D.
>
> On Fri, Nov 26, 2021 at 3:31 PM Adrien Grand  wrote:
> >
> > Please vote for release candidate 3 for Lucene 9.0.0.
> >
> > The artifacts can be downloaded from:
> >
> https://dist.apache.org/repos/dist/dev/lucene/lucene-9.0.0-RC3-rev-1ddce848cf3d5067efcafc6569d5f8203e56af0b
> >
> > You can run the smoke tester directly with this command:
> >
> > python3 -u dev-tools/scripts/smokeTestRelease.py \
> >
> https://dist.apache.org/repos/dist/dev/lucene/lucene-9.0.0-RC3-rev-1ddce848cf3d5067efcafc6569d5f8203e56af0b
> >
> > The vote will be open until 2021-11-30 9:00 UTC.
> >
> > [ ] +1  approve
> > [ ] +0  no opinion
> > [ ] -1  disapprove (and reason why)
> >
> > Here is my +1.
> >
> > --
> > Adrien
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: [VOTE] Release Lucene 9.0.0 RC3

2021-11-27 Thread Gus Heck
+1
SUCCESS! [0:08:46.711289]
(Java 11 JAVA_HOME=/home/gus/../zulu11.48.21-ca-jdk11.0.11-linux_x64/)
Smoketester only

On Sat, Nov 27, 2021 at 10:57 AM Tomoko Uchida 
wrote:

> Luke app starts on both of Linux and Windows (with or without spaces
> in the path) and works well for me.
> Thanks everyone who took the time for it.
>
> Tomoko
>
> 2021年11月27日(土) 23:57 Robert Muir :
> >
> > +1
> >
> > (with --test-java17)
> > SUCCESS! [0:26:01.193203]
> >
> > On Sat, Nov 27, 2021 at 8:40 AM Michael McCandless
> >  wrote:
> > >
> > > +1
> > >
> > > SUCCESS! [0:06:46.020662]
> > >
> > > What a crazy speedup to smoke tester!!
> > >
> > >
> > >
> > > Mike McCandless
> > >
> > > http://blog.mikemccandless.com
> > >
> > >
> > > On Sat, Nov 27, 2021 at 3:42 AM Ignacio Vera 
> wrote:
> > >>
> > >> +1
> > >>
> > >> SUCCESS! [0:17:28.435474]
> > >>
> > >>
> > >> On Sat, Nov 27, 2021 at 1:27 AM Jan Høydahl 
> wrote:
> > >>>
> > >>> +1 SUCCESS! [0:23:41.775448]
> > >>>
> > >>> Only ran smoketester this time.
> > >>>
> > >>> Jan
> > >>>
> > >>> > 26. nov. 2021 kl. 15:31 skrev Adrien Grand :
> > >>> >
> > >>> > Please vote for release candidate 3 for Lucene 9.0.0.
> > >>> >
> > >>> > The artifacts can be downloaded from:
> > >>> >
> https://dist.apache.org/repos/dist/dev/lucene/lucene-9.0.0-RC3-rev-1ddce848cf3d5067efcafc6569d5f8203e56af0b
> > >>> >
> > >>> > You can run the smoke tester directly with this command:
> > >>> >
> > >>> > python3 -u dev-tools/scripts/smokeTestRelease.py \
> > >>> >
> https://dist.apache.org/repos/dist/dev/lucene/lucene-9.0.0-RC3-rev-1ddce848cf3d5067efcafc6569d5f8203e56af0b
> > >>> >
> > >>> > The vote will be open until 2021-11-30 9:00 UTC.
> > >>> >
> > >>> > [ ] +1  approve
> > >>> > [ ] +0  no opinion
> > >>> > [ ] -1  disapprove (and reason why)
> > >>> >
> > >>> > Here is my +1.
> > >>> >
> > >>> > --
> > >>> > Adrien
> > >>> >
> > >>> >
> -
> > >>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > >>> > For additional commands, e-mail: dev-h...@lucene.apache.org
> > >>> >
> > >>>
> > >>>
> > >>> -
> > >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > >>> For additional commands, e-mail: dev-h...@lucene.apache.org
> > >>>
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: What should we do of branch_8x?

2021-11-21 Thread Gus Heck
+1 to uwe's suggestion

On Sun, Nov 21, 2021 at 10:42 PM Noble Paul  wrote:

> I think this is a reasonable suggestion Uwe.
>
> - We don't need to bring Gradle to 8.x
> - We can release 8.12 from a fork of 8.11.
> - we don't need to keep the Lucene source files in that branch. We can
> nuke it and just keep the Lucene binaries
>
> On Mon, Nov 22, 2021, 8:49 AM Uwe Schindler  wrote:
>
>> Hi,
>>
>> If this is really needed, I'd propose the following:
>>
>> - fork the branch_8_11 to solr's repo
>> - delete all subdirectories below lucene, keep common-build and other
>> stuff.
>> - add a single ivy.xml there that refers to all lucene jars of 8.11.x
>> (latest)
>> - adapt solr's "copy-lucene-jars" ant task to copy the ivy output dir
>> - delete the lucene stuff from release wizard.
>>
>> This is quick and easy. Adapting Gradle for a minor release is too hard.
>>
>> Am 21. November 2021 21:34:40 UTC schrieb Noble Paul <
>> noble.p...@gmail.com>:
>>>
>>> All Solr users using 8x and they will need some time to get comfortable
>>> with 9x . So, there is a good chance we may need to release an 8.12 based
>>> on Lucene 8.11
>>>
>>> On Mon, Nov 22, 2021, 8:22 AM Adrien Grand  wrote:
>>>
>>>> +1 to making branch_8x read-only as Uwe suggested
>>>>
>>>> I think Uwe's other point is also important: if we ever wanted to do a
>>>> Solr 8.12, it'd probably be a better option to fork the 8.11 branch than to
>>>> try to reuse branch_8x. So we don't need to tie the decision about what we
>>>> want to do with branch_8x with future plans around an 8.12 release?
>>>>
>>>> On Sun, Nov 21, 2021 at 7:48 PM Uwe Schindler  wrote:
>>>>
>>>>> This is of course all possible, but: WHY the heck do this?
>>>>>
>>>>>
>>>>>
>>>>> Lucene 9.0 will come out likely very soon. After that just update the
>>>>> gradle file of Solr main and remove the temporary repository (better
>>>>> comment it out). After that adapt some changes and release Solr 9.0.
>>>>>
>>>>>
>>>>>
>>>>> From that point on both projects have a clear split point and
>>>>> everybody can make sure that the backwards compatibility is handled
>>>>> according to project’s needs.
>>>>>
>>>>>
>>>>>
>>>>> If the Solr 9.0 release is a intermediary point (not all deprecations
>>>>> removed), release Solr 10.0 four months later, who cares? Solr 9.0 will be
>>>>> the release with many new features and Java 11 as minimum requirement.
>>>>>
>>>>>
>>>>>
>>>>> I would really, really not start and fuck up the release process for
>>>>> 8.x! Why not release 8.11.1 soon, if you have any changes in Solr to do?
>>>>> Why do this release needs to be called 8.12? It is just a version number,
>>>>> so why the heck this big issues? I won’t think that Solr will add any 
>>>>> major
>>>>> features before Solr 9. So what is your exact problem?
>>>>>
>>>>>
>>>>>
>>>>> Sorry, but this discussion is complete nonsense. Its just version
>>>>> numbers and some hick-hack between two parties that disagree. Keep calm 
>>>>> and
>>>>> don’t try to make it overcomplicated!
>>>>>
>>>>>
>>>>>
>>>>> I never said that we should kill or delete branch_8x. It can stay
>>>>> there forever. I just suggested to make it read-only and add a note. 
>>>>> Unless
>>>>> there’s really a need to do some 8.12 release (in which case, I’d fork 
>>>>> 8.11
>>>>> branch and move Lucene) I see no reason to act and fuck up the 
>>>>> repositories
>>>>> of both projects which have now a very clear state.
>>>>>
>>>>>
>>>>>
>>>>> Uwe
>>>>>
>>>>>
>>>>>
>>>>> -
>>>>>
>>>>> Uwe Schindler
>>>>>
>>>>> Achterdiek 19, D-28357 Bremen
>>>>>
>>>>> https://www.thetaphi.de
>>>>>
>>>>> eMail: u...@thetaphi.de
>>>>>
>>>>>
>>>>>
>>>>> *From:* Gus Heck 
>>>>> *Sent

Re: What should we do of branch_8x?

2021-11-21 Thread Gus Heck
Release of Solr 8.12 It should require the current lucene-solr 8.x branch
to remove the lucene bits and declare a dependency on lucene 8.11 lucene,
that bit shouldn't be too hard if done soon... and the release process for
8.x would not publish a lucene artifact which is likely the harder bit. I
think the option should be open assuming someone is willing to do that
work.What should not be an option is any further lucene releases on 8.x
and I'd be very leery of any attempt to consume lucene 9.0 on Solr 8.x

The Lucene guarantees are irrelevant unless someone contemplates releasing
an 8.12 lucene, and I really think that would require a positive vote from
the Lucene PMC (which sounds very unlikely since I see fingers twitching
over the -1 holsters there :) )

So while I don't favor deleting the entire solr 8.x branch I think it's now
fine to remove lucene from it.

To make things pretty, one could push the 8.x branch to the solr repo AFTER
lucene is removed, but that sounds like busy work unless there is some
formal or financial need to close the old repo. They are now fully separate
projects and what solr does with the non-lucene bits is not a concern to
lucene pmc (though almost all of us are on both committees of course, but
hat wearing etc..)

On Sun, Nov 21, 2021 at 8:43 AM Robert Muir  wrote:

> I dunno, this seems really crazy to me. Splitting out solr into its
> own repository and allowing it to be released independently from
> lucene has already been done, lots of work :) Why not just move
> forwards?
>
> On Sun, Nov 21, 2021 at 8:16 AM Ishan Chattopadhyaya
>  wrote:
> >
> >
> >
> > On Sun, 21 Nov, 2021, 6:31 pm Robert Muir,  wrote:
> >>
> >> Sorry, I just don't understand the implications of what you are
> suggesting.
> >>
> >> The code in question is lucene+solr combined, and the build system and
> >> packaging and everything only knows how to do that. So are you forking
> >> all the lucene code into the solr repo too?
> >
> >
> > Need to split it up and remove the Lucene code from there in order to be
> able to release Solr independently. We can do so later (I'm currently on
> travel), if/when needed.
> >>
> >>
> >> I don't really understand your need to have a branch_8x. we can nuke
> >> it, and you can do any of this from a branch_8_11 some other day, no?
> >
> >
> > I guess we can, just don't know the divergence. Just to be on the safer
> side, don't want to lose access to the branch_8x over a weekend before I or
> persons more knowledgeable (on the differences between the branches) than I
> get a chance to review the situation. Hence, I just copied the branch there
> for the moment.
> >>
> >>
> >> On Sun, Nov 21, 2021 at 7:57 AM Ishan Chattopadhyaya
> >>  wrote:
> >> >
> >> > > I don't think the solr PMC should issue Lucene 8.12 either.
> >> > I never expressed any intention of doing so. Besides, is it even
> possible (ASF policies wise)?
> >> >
> >> > This is a weekend, and I feel bad holding up the 9.0 release (since
> this is a blocker). Solr PMC can decide later on Solr's releases, and hence
> I'm going to copy this branch_8x over to Solr repo's
> "lucene-solr/branch_8x" branch.
> >> >
> >> >
> >> > On Sun, Nov 21, 2021 at 6:14 PM Robert Muir  wrote:
> >> >>
> >> >> I don't think the solr PMC should issue Lucene 8.12 either.
> >> >>
> >> >> On Sun, Nov 21, 2021 at 7:42 AM Ishan Chattopadhyaya
> >> >>  wrote:
> >> >> >
> >> >> > Sounds good, Rob. Should I copy over the branch_8x to the solr
> repo until we have further clarity on the course of action to be taken with
> Solr releases?
> >> >> >
> >> >> > On Sun, 21 Nov, 2021, 6:10 pm Robert Muir, 
> wrote:
> >> >> >>
> >> >> >> Nope, it isn't crazy. I am trying to ensure the backwards
> >> >> >> compatibility that we have is on solid, sustainable footing
> before we
> >> >> >> release a new version promising double the back compat.
> >> >> >>
> >> >> >> On Sun, Nov 21, 2021 at 7:37 AM Ishan Chattopadhyaya
> >> >> >>  wrote:
> >> >> >> >
> >> >> >> > Solr doesn't have backward compatability tests, only Lucene has.
> >> >> >> >
> >> >> >> > That's why I proposed leaving the door open for a Solr 8.12
> release based on already released 8.11 Lucene and not releasing any further
> 8.x minor version release of Lucene.
> >> >> >> >
> >> >> >> > As I said, if that's problematic to do on branch_8x of
> lucene-solr, then we can do so in the solr repo. If some urgent action to
> nuke the branch is to be taken, please give some time to explore
> alternatives that affect Solr's developement.
> >> >> >> >
> >> >> >> > Holding up Lucene 9.0 release for removal of branch_8x is
> lunacy, not the continued existence of this branch in the shared repo,
> since a future course of action should be deliberated upon before nuking
> the branch.
> >> >> >> >
> >> >> >> > On Sun, 21 Nov, 2021, 5:34 pm Uwe Schindler, 
> wrote:
> >> >> >> >>
> >> >> >> >> Hi,
> >> >> >> >>
> >> >> >> >> I fully agree with Robert here.
> >> >> >> >>
> >> >> >> >> I originally sent the 

Re: [VOTE] Release Lucene/Solr 8.11.0 RC1

2021-11-09 Thread Gus Heck
+1 SUCCESS! [0:54:16.982080]

On Tue, Nov 9, 2021 at 9:06 PM Mayya Sharipova
 wrote:

> +1 SUCCESS! [1:09:47.023515]
>
> On Tue, Nov 9, 2021 at 6:42 PM Timothy Potter 
> wrote:
>
>> totally unofficial, but I posted a Docker image for testing 8.11.0 RC1
>> on K8s here: thelabdude/apache-solr-dev:8.11.0-rc1
>>
>> On Tue, Nov 9, 2021 at 1:50 PM Adrien Grand  wrote:
>> >
>> > Please vote for release candidate 1 for Lucene/Solr 8.11.0
>> >
>> > The artifacts can be downloaded from:
>> >
>> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.11.0-RC1-reve912fdd5b632267a9088507a2a6bcbc75108f381
>> >
>> > You can run the smoke tester directly with this command:
>> >
>> > python3 -u dev-tools/scripts/smokeTestRelease.py \
>> >
>> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.11.0-RC1-reve912fdd5b632267a9088507a2a6bcbc75108f381
>> >
>> > The vote will be open for at least 72 hours i.e. until 2021-11-12 21:00
>> UTC.
>> >
>> > [ ] +1  approve
>> > [ ] +0  no opinion
>> > [ ] -1  disapprove (and reason why)
>> >
>> > Here is my +1
>> >
>> > --
>> > Adrien
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Welcome Mayya Sharipova to the Lucene PMC

2021-06-28 Thread Gus Heck
Welcome :)

On Mon, Jun 28, 2021, 6:30 PM Anshum Gupta  wrote:

> Congratulations and welcome, Mayya!
>
> On Mon, Jun 28, 2021 at 6:16 AM Robert Muir  wrote:
>
>> I am pleased to announce that Mayya has accepted an invitation to join
>> the Lucene PMC!
>>
>> Congratulations, and welcome aboard!
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>
> --
> Anshum Gupta
>


Re: Welcome Greg Miller as Lucene committer

2021-06-01 Thread Gus Heck
Welcome Greg :)

On Tue, Jun 1, 2021 at 12:03 AM Tomás Fernández Löbbe 
wrote:

> Congrats Greg!!
>
> On Mon, May 31, 2021 at 9:37 AM Gautam Worah 
> wrote:
>
>> Congratulations Greg :)
>>
>> On Mon, May 31, 2021, 8:02 AM Ilan Ginzburg  wrote:
>>
>>> Congrats Greg!
>>>
>>> On Sun, May 30, 2021 at 4:35 PM Greg Miller  wrote:
>>>
 Thanks everyone! I'm honored to have been nominated and look forward
 to continuing to work with all of you on Lucene! I'm incredibly
 grateful for everyone that has helped me so far. There's a lot to
 learn in Lucene and this community has been a fantastic help ramping
 up, providing thorough PR feedback/ideas/etc. and simply been a great
 group of people to collaborate with.

 As far as a brief bio goes, I live in the Seattle area and work for
 Amazon's "Product Search" team, which I joined in January of this
 year. I'm a naturally curious person and find myself fascinated by
 data structure / algorithm problems, so diving into Lucene has been
 really fun! I'm also an avid runner (mostly marathons but right now
 I'm training for my first one-mile race on a track), and love to
 travel with my wife and daughter (although that's been on "pause" for
 obvious reasons for the past year+). My biggest accomplishment of 2021
 so far has been teaching my daughter to ride a bike, but being
 nominated as a Lucene committer is a close second :)

 Thanks again everyone and looking forward to continuing to work with
 all of you!

 Cheers,
 -Greg

 On Sat, May 29, 2021 at 7:59 PM Michael McCandless
  wrote:
 >
 > Welcome Greg!
 >
 > Mike
 >
 > On Sat, May 29, 2021 at 3:47 PM Adrien Grand 
 wrote:
 >>
 >> I'm pleased to announce that Greg Miller has accepted the PMC's
 invitation to become a committer.
 >>
 >> Greg, the tradition is that new committers introduce themselves with
 a brief bio.
 >>
 >> Congratulations and welcome!
 >>
 >>
 >> --
 >> Adrien
 >
 > --
 > Mike McCandless
 >
 > http://blog.mikemccandless.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Release Lucene/Solr 8.9.0 should we have it soon

2021-05-13 Thread Gus Heck
Perhaps https://issues.apache.org/jira/browse/SOLR-15378 should be
investigated before 8.9, maybe make it a blocker?

On Thu, May 13, 2021 at 1:35 AM Robert Muir  wrote:

> Mayya, I created backport for Adrien's issue here, to try to help out:
> https://github.com/apache/lucene-solr/pull/2495
>
> Personally, I felt that merging non-trivial changes from main branch
> to 8.x has some additional risks when cherry-picking:
> * structural changes in main branch making merging more difficult
> (e.g. LUCENE-9705 reorganization of codec versioning, great change
> moving forwards though)
> * there are many style changes due to spotless in main branch which
> add noise to merging against old code.
> * In the specific case of LUCENE-9827, the usual additional tricky
> backwards compatibility for 8.x must be added in the backport (due to
> minor version bumps there) which can go wrong.
>
> I still think that particular change is worth considering for 8.9, it
> isn't just a performance bug but also a huge improvement to test
> coverage that helps combat risks.
>
> But we should still take some precautions when releasing an 8.x IMO:
> * be mindful of what we are backporting and the risks involved: it is
> harder.
> * try to let jenkins bake changes in 8.x branches for longer than
> usual? even a few days really helps.
>
> On Tue, May 11, 2021 at 1:29 PM Mayya Sharipova
>  wrote:
> >
> > Thanks everyone,
> >
> > Adrien, I  am happy to try to be a release manager for this release.
> >
> > Adrien, and Gus, please let me know when your changes are merged to 8.x
> >
> >
> >
> > On Tue, May 11, 2021 at 10:38 AM Gus Heck  wrote:
> >>
> >> I'm also looking to find time to get
> https://issues.apache.org/jira/browse/SOLR-14597 into some sort of 8x.
> I've recently completed the back port of 2/3 of the lucene tickets that are
> related, and hope to work on the third tomorrow
> >>
> >> I had some feedback there, but I think folks were waiting for the
> version integrated with the final form of the Lucene tickets before delving
> further. Hopefully this week I can start on a patch that does that.
> >>
> >> On Tue, May 11, 2021 at 10:25 AM Adrien Grand 
> wrote:
> >>>
> >>> I would like to backport LUCENE-9827 before we release 8.9, a
> performance regression to stored fields merges. I'll work on this as soon
> as possible.
> >>>
> >>> On Thu, May 6, 2021 at 10:28 PM Adrien Grand 
> wrote:
> >>>>
> >>>> +1
> >>>>
> >>>> Mayya, are you volunteering to be the release manager?
> >>>>
> >>>> Le jeu. 6 mai 2021 à 18:06, Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> a écrit :
> >>>>>
> >>>>> +1
> >>>>>
> >>>>> On Thu, May 6, 2021 at 7:50 PM Mayya Sharipova <
> mayya.sharip...@elastic.co.invalid> wrote:
> >>>>>>
> >>>>>> Hello everyone,
> >>>>>> I was wondering if we can have a 8.9.0 release. It has been more
> than 3 months since 8.8.0 was released.
> >>>>>> 8.9.0 doesn't need to be the last release in the 8.x series.
> >>>>>>
> >>>>>> Thanks.
> >>>
> >>>
> >>>
> >>> --
> >>> Adrien
> >>
> >>
> >>
> >> --
> >> http://www.needhamsoftware.com (work)
> >> http://www.the111shift.com (play)
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Release Lucene/Solr 8.9.0 should we have it soon

2021-05-11 Thread Gus Heck
I'm also looking to find time to get
https://issues.apache.org/jira/browse/SOLR-14597 into some sort of 8x. I've
recently completed the back port of 2/3 of the lucene tickets that are
related, and hope to work on the third tomorrow

I had some feedback there, but I think folks were waiting for the version
integrated with the final form of the Lucene tickets before delving
further. Hopefully this week I can start on a patch that does that.

On Tue, May 11, 2021 at 10:25 AM Adrien Grand  wrote:

> I would like to backport LUCENE-9827
>  before we release
> 8.9, a performance regression to stored fields merges. I'll work on this as
> soon as possible.
>
> On Thu, May 6, 2021 at 10:28 PM Adrien Grand  wrote:
>
>> +1
>>
>> Mayya, are you volunteering to be the release manager?
>>
>> Le jeu. 6 mai 2021 à 18:06, Ishan Chattopadhyaya <
>> ichattopadhy...@gmail.com> a écrit :
>>
>>> +1
>>>
>>> On Thu, May 6, 2021 at 7:50 PM Mayya Sharipova
>>>  wrote:
>>>
 Hello everyone,
 I was wondering if we can have a 8.9.0 release. It has been more than 3
 months since 8.8.0 was released.
 8.9.0 doesn't need to be the last release in the 8.x series.

 Thanks.

>>>
>
> --
> Adrien
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Welcome Zach Chen as Lucene committer

2021-04-19 Thread Gus Heck
Welcome Zach :)

On Mon, Apr 19, 2021 at 1:09 PM Xi Chen 
wrote:

> Thanks Adrien for the announcement and everyone for the warm welcome! I’m
> deeply honored to be able to join this great community!
>
>
> I work at Amazon Lab126 and have been involved in voice bot / chat bot
> development for the last 5 years. I first used search technology in
> production setting in a project 2 years ago, but my curiosity and interest
> in Lucene started all the way back in 2013 when my mentor back then told me
> about this wonderful search library. The name stuck in my head ever since,
> and now I’m involved in a way that I never anticipate possible! Thanks
> again to the entire community for being so supportive, welcoming and
> patience on me. I simply wouldn’t be able to get to this far without a
> strong community, and now I’m ready to contribute back. I look forward to
> working with you more closely on Lucene going forward!
>
>
> Best,
>
> Zach
>
> On Apr 19, 2021, at 10:00 AM, Martin Gainty  wrote:
>
> 
> welcome Zach!
>
> martin​
>
> --
> *From:* Dawid Weiss 
> *Sent:* Monday, April 19, 2021 12:22 PM
> *To:* Lucene Dev 
> *Cc:* zacharym...@yahoo.com 
> *Subject:* Re: Welcome Zach Chen as Lucene committer
>
> Congratulations and welcome, Zach. Well deserved!
>
> Dawid
>
> On Mon, Apr 19, 2021 at 4:14 PM Adrien Grand  wrote:
> >
> > I'm pleased to announce that Zach Chen has accepted the PMC's invitation
> to become a committer.
> >
> > Zach, the tradition is that new committers introduce themselves with a
> brief bio.
> >
> > Congratulations and welcome!
> >
> > --
> > Adrien
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Welcome Peter Gromov as Lucene committer

2021-04-06 Thread Gus Heck
Welcome!

On Tue, Apr 6, 2021 at 2:16 PM Mike Drob  wrote:

> Welcome!
>
> On Tue, Apr 6, 2021 at 1:06 PM Dawid Weiss  wrote:
>
>> Congratulations and welcome, Peter!
>>
>> Dawid
>>
>> On Tue, Apr 6, 2021 at 7:48 PM Robert Muir  wrote:
>> >
>> > I'm pleased to announce that Peter Gromov has accepted the PMC's
>> invitation to become a committer.
>> >
>> > Peter, the tradition is that new committers introduce themselves with a
>> brief bio.
>> >
>> > Congratulations and welcome!
>> >
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Welcome Bruno to the Apache Lucene PMC

2021-03-11 Thread Gus Heck
Welcome :)

On Thu, Mar 11, 2021 at 9:58 AM Houston Putman 
wrote:

> Congrats and welcome Bruno!
>
> On Thu, Mar 11, 2021 at 8:32 AM David Smiley  wrote:
>
>> Welcome Bruno!
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>> On Wed, Mar 10, 2021 at 7:56 PM Mike Drob  wrote:
>>
>>> I am pleased to announce that Bruno has accepted an invitation to join
>>> the Lucene PMC!
>>>
>>> Congratulations, and welcome aboard!
>>>
>>> Mike
>>>
>>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Lucene/Solr Code of Conduct and etiquette

2021-03-03 Thread Gus Heck
Another thought looking at the top of that page (and somewhat off topic),
though obviously mentioned elsewhere, the first line of support should
probably the Ref Guide. Not that it can't be found elsewhere, but it should
probably be the first place folks look.

On Wed, Mar 3, 2021 at 2:14 AM Anshum Gupta  wrote:

> Thanks for adding this, Jan.
>
> I'll take a look at the pages tomorrow.
>
> On Tue, Mar 2, 2021 at 1:30 AM Jan Høydahl  wrote:
>
>> Hi community!
>>
>> The Apache Software Foundation has a foundation-wide Code of Conduct
>> written up (https://www.apache.org/foundation/policies/conduct), and I
>> believe the Lucene and Solr communities would benefit from explicitly
>> adopting it for our projects. This is an example of how HBase has done so:
>> https://hbase.apache.org/coc.html
>>
>> When it comes to email communication in particular, I find this etiquette
>> from ComDev helpful: https://community.apache.org/contributors/etiquette.
>> Let's link to it as well.
>>
>> Finally, it could be helpful to also call out http://theapacheway.com/
>> or a similar resource about The Apache Way.
>>
>> Please see my early attempt at including CoC in the new Solr webpage
>> draft:
>> https://lucene-solrtlp.staged.apache.org/community.html#code-of-conduct
>> I plan to add a similar section or page to the Lucene site.
>>
>> Appreciate your feedback on this. Our discussions sometimes heat up,
>> which is not uncommon from time to time. But perhaps reminding ourselves
>> about The Apache Way, CoC and Etiquette will help each of us adopting
>> healthier practices in writing emails, choosing our words wisely etc. I
>> know it helps for me.
>>
>> Jan
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>
> --
> Anshum Gupta
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Lucene/Solr Code of Conduct and etiquette

2021-03-02 Thread Gus Heck
I like the sentiments in all of those links. I'd be +1 on including all 3
conduct, etiquette and apacheway links.

On Tue, Mar 2, 2021 at 4:46 AM Atri Sharma  wrote:

> +1
>
> On Tue, 2 Mar 2021, 15:01 Jan Høydahl,  wrote:
>
>> Hi community!
>>
>> The Apache Software Foundation has a foundation-wide Code of Conduct
>> written up (https://www.apache.org/foundation/policies/conduct), and I
>> believe the Lucene and Solr communities would benefit from explicitly
>> adopting it for our projects. This is an example of how HBase has done so:
>> https://hbase.apache.org/coc.html
>>
>> When it comes to email communication in particular, I find this etiquette
>> from ComDev helpful: https://community.apache.org/contributors/etiquette.
>> Let's link to it as well.
>>
>> Finally, it could be helpful to also call out http://theapacheway.com/
>> or a similar resource about The Apache Way.
>>
>> Please see my early attempt at including CoC in the new Solr webpage
>> draft:
>> https://lucene-solrtlp.staged.apache.org/community.html#code-of-conduct
>> I plan to add a similar section or page to the Lucene site.
>>
>> Appreciate your feedback on this. Our discussions sometimes heat up,
>> which is not uncommon from time to time. But perhaps reminding ourselves
>> about The Apache Way, CoC and Etiquette will help each of us adopting
>> healthier practices in writing emails, choosing our words wisely etc. I
>> know it helps for me.
>>
>> Jan
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Revisiting Standardized Test Names in Solr

2021-02-26 Thread Gus Heck
ote:
>
>
> I look forward to a standardization on *something* but would prefer that
> we not make a sweeping change like this until after Mark's "ref branch" is
> reconciled.  I don't want that to hang over the project indefinitely, but
> we can wait; we've not had this standardization yet for many years, after
> all.
>
> That said, it would be good to choose the standard name now so that there
> is less to change later.  Can someone dig up the statistics on Solr's name
> choice to see if there is a clear winner (e.g. >60%)?  I don't have a
> strong opinion on whatever the standard should be so long as there is a
> standard :-)
>
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Sun, Feb 21, 2021 at 12:18 PM Gus Heck  wrote:
>
>
> FWIW, I'm not really in favor of the convention Lucene adopted. I probably
> lost track of the debate and failed to object which is on me, but I guess
> it was because that was the lower number of changes there? It's certainly
> much less legible in the IDE to have a wall of classes all starting with T.
> Maybe given that the projects are splitting Solr can Stick with FooTest not
> TestFoo? I think *Test suffix is more common in Solr... (though I haven't
> attempted to quantify it)
>
> On Sun, Feb 21, 2021 at 12:05 PM Eric Pugh <
> ep...@opensourceconnections.com> wrote:
>
>
> Makes sense to me.
>
>
> On Feb 20, 2021, at 2:42 PM, Marcus Eagan  wrote:
>
> Hi all,
>
> Now that Lucene’s standardization is complete and I believe enforced,
> should we discuss if we could bring the same consistency to Solr?
>
> Best,
>
> Marcus
> --
> Marcus Eagan
>
>
> ___
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com | My Free/Busy
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> 
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 
>
> --
> - Mark
>
> http://about.me/markrmiller
>
>
> --
> - Mark
>
> http://about.me/markrmiller
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> 
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 
>
>
> ___
> *Eric Pugh **| *Founder & CEO | OpenSource Connections, LLC | 434.466.1467
> | http://www.opensourceconnections.com | My Free/Busy
> <http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed
> <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless
> of whether attachments are marked as such.
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Proposal for the Lucene Dependency after git repo split

2021-02-26 Thread Gus Heck
Except I just finished helping a contributor with a feature that touches
both and I know for a fact  that it was developed for his customer who was
using solr (payload inequalities)... and have another in the works (the
AQP) Not being able to enhance lucene to support a feature in solr is
an issue IMHO.

On Thu, Feb 25, 2021 at 6:05 PM Mike Drob  wrote:

> It is possible to publish snapshots into the Apache Nexus repository. That
> said, I think it is a bad idea for Solr to depend on Lucene snapshots
> because that constrains the ability to do releases. Either you have to wait
> for a Lucene release and then you can cut over, or you have to figure out
> what changes you need to roll back.
>
> Features today rarely touch both fronts anyway, they usually land in
> Lucene first and then percolate into Solr. For an easy example, we can see
> how WAND was developed recently.
>
> On Thu, Feb 25, 2021 at 5:02 PM Houston Putman 
> wrote:
>
>> Once the projects are on separate release cadences there wont be an
>> ability to “add on both fronts” anymore. You will have to add to lucene,
>> wait for a release, then add to Solr once Solr upgrades its lucene
>> dependency to that new version. I dont imagine that we are going to keep
>> Solr master/main, or even 8x, 9x, etc, depending on Lucene snaphsots in
>> perpetuity. After it becomes possible (when lucene 9.0 is released) we
>> should only be using released lucene versions as dependencies for every
>> version branch in Solr.
>>
>> On Thu, Feb 25, 2021 at 5:49 PM Gus Heck  wrote:
>>
>>> Until the first feature that wants to add something on both fronts... Is
>>> it possible for Lucene to publish nightly snapshots? I know there is some
>>> level of support for snapshots in central, though I don't know what
>>> their usage policies are. If that's too restricted is there an artifact
>>> repo controlled by the ASF that could be used? (An implementation of Apache
>>> Archiva?) This would have the added benefit of allowing solr to detect when
>>> Lucene breaks something before its released.
>>>
>>> On Thu, Feb 25, 2021 at 4:50 PM Houston Putman 
>>> wrote:
>>>
>>>> Hey everyone,
>>>>
>>>> Currently there is discussion going on, in SOLR-14762
>>>> <https://issues.apache.org/jira/browse/SOLR-14762>, regarding the
>>>> split of the lucene-solr repo into individual repos for Solr and Lucene.
>>>> There seems to be agreement that we shouldn't wait for a Lucene release to
>>>> do the split, and instead split now and release whenever that happens.
>>>>
>>>> The biggest issue that arises there is that Solr's master branch is
>>>> obviously based on Lucene's master branch, since they are currently the
>>>> same. So when the split happens, Solr master will have to depend on Lucene
>>>> 9.0-SNAPSHOT. We can have solr merely depend on the lucene snapshot, but
>>>> that will result in inconsistent builds, depending on whatever cached
>>>> dependencies each dev has locally. Personally, I think that will cause a
>>>> bunch of build errors and headaches for everyone trying to maintain Solr.
>>>>
>>>> There is another option though. We could instead do an *alpha*
>>>> "release" of lucene-solr 9.0 right before the repo is split. Therefore Solr
>>>> can reliably depend on a stable version of lucene until 9.0 is truly
>>>> released. (And lucene can use a stable version of Solr, if it sees a need
>>>> for that). There would be no guarantees for using this alpha release, and
>>>> we don't have to advertise it at all.
>>>>
>>>> It's not perfect, but I think it would be preferable to depending on an
>>>> ever-changing SNAPSHOT lucene.
>>>>
>>>> - Houston
>>>>
>>>
>>>
>>> --
>>> http://www.needhamsoftware.com (work)
>>> http://www.the111shift.com (play)
>>>
>>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Proposal for the Lucene Dependency after git repo split

2021-02-25 Thread Gus Heck
Until the first feature that wants to add something on both fronts... Is it
possible for Lucene to publish nightly snapshots? I know there is some
level of support for snapshots in central, though I don't know what
their usage policies are. If that's too restricted is there an artifact
repo controlled by the ASF that could be used? (An implementation of Apache
Archiva?) This would have the added benefit of allowing solr to detect when
Lucene breaks something before its released.

On Thu, Feb 25, 2021 at 4:50 PM Houston Putman 
wrote:

> Hey everyone,
>
> Currently there is discussion going on, in SOLR-14762
> , regarding the split
> of the lucene-solr repo into individual repos for Solr and Lucene. There
> seems to be agreement that we shouldn't wait for a Lucene release to do the
> split, and instead split now and release whenever that happens.
>
> The biggest issue that arises there is that Solr's master branch is
> obviously based on Lucene's master branch, since they are currently the
> same. So when the split happens, Solr master will have to depend on Lucene
> 9.0-SNAPSHOT. We can have solr merely depend on the lucene snapshot, but
> that will result in inconsistent builds, depending on whatever cached
> dependencies each dev has locally. Personally, I think that will cause a
> bunch of build errors and headaches for everyone trying to maintain Solr.
>
> There is another option though. We could instead do an *alpha* "release"
> of lucene-solr 9.0 right before the repo is split. Therefore Solr can
> reliably depend on a stable version of lucene until 9.0 is truly released.
> (And lucene can use a stable version of Solr, if it sees a need for that).
> There would be no guarantees for using this alpha release, and we don't
> have to advertise it at all.
>
> It's not perfect, but I think it would be preferable to depending on an
> ever-changing SNAPSHOT lucene.
>
> - Houston
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Revisiting Standardized Test Names in Solr

2021-02-21 Thread Gus Heck
FWIW, I'm not really in favor of the convention Lucene adopted. I probably
lost track of the debate and failed to object which is on me, but I guess
it was because that was the lower number of changes there? It's
certainly much less legible in the IDE to have a wall of classes all
starting with T. Maybe given that the projects are splitting Solr can Stick
with FooTest not TestFoo? I think *Test suffix is more common in Solr...
(though I haven't attempted to quantify it)

On Sun, Feb 21, 2021 at 12:05 PM Eric Pugh 
wrote:

> Makes sense to me.
>
>
> On Feb 20, 2021, at 2:42 PM, Marcus Eagan  wrote:
>
> Hi all,
>
> Now that Lucene’s standardization is complete and I believe enforced,
> should we discuss if we could bring the same consistency to Solr?
>
> Best,
>
> Marcus
> --
> Marcus Eagan
>
>
> ___
> *Eric Pugh **| *Founder & CEO | OpenSource Connections, LLC | 434.466.1467
> | http://www.opensourceconnections.com | My Free/Busy
> 
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed
> 
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless
> of whether attachments are marked as such.
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Simplifying source pattern checks

2021-02-21 Thread Gus Heck
Sure I can do that. Was going to file an issue and link. I think adding a
link sends a mail to the linked issue, but I could be wrong.

On Sun, Feb 21, 2021 at 11:57 AM David Smiley  wrote:

> Makes sense.  I see you haven't commented on the issue about this; I
> prefer that tactic as it gets noticed by everyone "Watching" the original
> issue, even if it's old.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Sat, Feb 20, 2021 at 5:14 PM Gus Heck  wrote:
>
>> I noticed today that SOLR-10883 added checks for patterns that didn't
>> play nice with PDF generation. Now that we don't generate the PDF anymore
>> perhaps we can do away with those checks? Anyone have thoughts to the
>> contrary?
>>
>> https://issues.apache.org/jira/browse/SOLR-10883
>>
>> -Gus
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Simplifying source pattern checks

2021-02-20 Thread Gus Heck
I noticed today that SOLR-10883 added checks for patterns that didn't play
nice with PDF generation. Now that we don't generate the PDF anymore
perhaps we can do away with those checks? Anyone have thoughts to the
contrary?

https://issues.apache.org/jira/browse/SOLR-10883

-Gus

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Congratulations to the new Lucene PMC Chair, Michael Sokolov!

2021-02-17 Thread Gus Heck
Congratulations :)

On Wed, Feb 17, 2021 at 5:42 PM Tomás Fernández Löbbe 
wrote:

> Congratulations Mike!
>
> On Wed, Feb 17, 2021 at 2:42 PM Steve Rowe  wrote:
>
>> Congrats Mike!
>>
>> --
>> Steve
>>
>> > On Feb 17, 2021, at 4:31 PM, Anshum Gupta 
>> wrote:
>> >
>> > Every year, the Lucene PMC rotates the Lucene PMC chair and Apache Vice
>> President position.
>> >
>> > This year we nominated and elected Michael Sokolov as the Chair, a
>> decision that the board approved in its February 2021 meeting.
>> >
>> > Congratulations, Mike!
>> >
>> > --
>> > Anshum Gupta
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Hunspell performance

2021-02-11 Thread Gus Heck
>
> I don't have any confidence that solr would default to the "smaller"
> option or fix how they manage different solr cores or thousands of
> threads or any of the analyzer issues.


Certainly there's work to be done there. Many things to improve.  Separate
issue however.

>

And who would maintain this
> separate hunspell backend? I don't think it is fair to Peter to have
> to cope with 2 implementations of hunspell, 1 is certainly enough...
> :). It's all apache license, at the end of the day if someone wants to
> step up, let 'em. otherwise let's get out of their way.
>

Entirely valid point, but what I wouldn't want to see is a case where
someone using an existing install had to buy significantly more servers to
continue using it with the new version. I also think it's great to have
improved performance :) I've had several customers that have been
disappointed at the cost of servers necessary for the size of their data.
Usually this cost is due to memory requirements, not cpu needs. I often
have to explain that search is all about trading memory for speed, but have
found myself wishing that it were easier to vary the degree of that
trade-off. So that's the root of my comment...


Re: Hunspell performance

2021-02-10 Thread Gus Heck
+1 to configurability that is well documented, and reasonably actionable
downstream in Solr... Some folks struggle with the costs of buying machines
with lots of memory.

On Wed, Feb 10, 2021 at 3:05 PM Dawid Weiss  wrote:

>
>
>> To me the challenge with such a change is just trying to prevent
>
> strange dictionaries from blowing up to 30x the space :)
>>
>
> Maybe the "backend" could be configurable somehow so that you could change
> the strategy depending on your needs?... I haven't looked at how FSTs are
> used but if can be hidden behind a facade then an alternative
> implementation could be provided depending on one's need?
>
> D.
>
>
>>
>> On Wed, Feb 10, 2021 at 12:53 PM Peter Gromov
>>  wrote:
>> >
>> > I was hoping for some numbers :) In the meantime, I've got some of my
>> own. I loaded 90 dictionaries from https://github.com/wooorm/dictionaries
>> (there's more, but I ignored dialects of the same base language). Together
>> they currently consume a humble 166MB. With one of my less memory-hungry
>> approaches, they'd take ~500MB (maybe less if I optimize, but probably not
>> significantly). Is this very bad or tolerable for, say, 50% speedup?
>> >
>> > I've seen huge *.aff files, and I'm planning to do something with affix
>> FSTs, too. They take some noticeable time, too, but much less than *.dic-s
>> one, so for now I concentrate on *.dic.
>> >
>> > > Sure, but 20% of those linear scans are maybe 7x slower
>> >
>> > Checked that. The distribution appears to be decreasing monotonically.
>> No linear scans are longer than 8, and ~85% of all linear scans end after
>> no more than 1 miss.
>> >
>> > I'll try BYTE1 if I manage to do it. It turned out to be surprisingly
>> complicated :(
>> >
>> > On Wed, Feb 10, 2021 at 5:04 PM Robert Muir  wrote:
>> >>
>> >> Peter, looks like you are way ahead of me :) Thanks for all the work
>> >> you have been doing here, and thanks to Dawid for helping!
>> >>
>> >> You probably know a lot of this code better than me at this point, but
>> >> I remember a couple of these pain points, inline below:
>> >>
>> >> On Wed, Feb 10, 2021 at 9:44 AM Peter Gromov
>> >>  wrote:
>> >> >
>> >> > Hi Robert,
>> >> >
>> >> > Yes, having multiple dictionaries in the same process would increase
>> the memory significantly. Do you have any idea about how many of them
>> people are loading, and how much memory they give to Lucene?
>> >>
>> >> Yeah in many cases, the user is using a server such as solr or
>> elasticsearch.
>> >> Let's use solr as an example, as others are here to correct it, if I
>> am wrong.
>> >>
>> >> Example to understand the challenges: user uses one of solr's 3
>> >> mechanisms to detect language and send to different pipeline:
>> >>
>> https://lucene.apache.org/solr/guide/8_8/detecting-languages-during-indexing.html
>> >> Now we know these language detectors are imperfect, if the user maps a
>> >> lot of languages to hunspell pipelines, they may load lots of
>> >> dictionaries, even by just one stray miscategorized document.
>> >> So it doesn't have to be some extreme "enterprise" use-case like
>> >> wikipedia.org, it can happen for a little guy faced with a
>> >> multilingual corpus.
>> >>
>> >> Imagine the user decides to go further, and host solr search in this
>> >> way for a couple local businesses or govt agencies.
>> >> They support many languages and possibly use this detection scheme
>> >> above to try to make language a "non-issue".
>> >> The user may assign each customer a solr "core" (separate index) with
>> >> this configuration.
>> >> Does each solr core load its own HunspellStemFactory? I think it might
>> >> (in isolated classloader), I could be wrong.
>> >>
>> >> For the elasticsearch case, maybe the resource usage in the same case
>> >> is lower, because they reuse dictionaries per-node?
>> >> I think this is how it works, but I honestly can't remember.
>> >> Still the problem remains, easy to end up with dozens of these things
>> in memory.
>> >>
>> >> Also we have the problem that memory usage for a specific can blow up
>> >> in several ways.
>> >> Some languages have bigger .aff file than .dic!
>> >>
>> >> > Thanks for the idea about root arcs. I've done some quick sampling
>> and tracing (for German). 80% of root arc processing time is spent in
>> direct addressing, and the remainder is linear scan (so root acrs don't
>> seem to present major issues). For non-root arcs, ~50% is directly
>> addressed, ~45% linearly-scanned, and the remainder binary-searched.
>> Overall there's about 60% of direct addressing, both in time and invocation
>> counts, which doesn't seem too bad (or am I mistaken?). Currently BYTE4
>> inputs are used. Reducing that might increase the number of directly
>> addressed arcs, but I'm not sure that'd speed up much given that time and
>> invocation counts seem to correlate.
>> >> >
>> >>
>> >> Sure, but 20% of those linear scans are maybe 7x slower, its
>> >> O(log2(alphabet_size)) right (assuming alphabet size ~ 

Re: [DISCUSS] ConfigSet ZK to file system fallback

2021-02-05 Thread Gus Heck
>
> I'd prefer it being an explicit fallback or resolution order instead of
> hardcoded magick.
> I.e. able to configure a configset search path such as ["local", "zk",
>  "somethingelse"]. This would make resource loader prefer local files even
> if they exist in ZK.
>
>
This actually winds up being a convention vs configuration thing I think.
Complexity is situational sometimes. Having a fallback convention means
that when you are experienced, you can walk into any install and know
what's going on, but if you are new, there's a learning curve. On the other
hand if we allow resolution order to be configurable, then one never knows
what's going on until you've first got an answer to "how's it been
configured". This can sometimes be a little simpler for first timers,
except that certain configurations might be a bad idea, and then they won't
see the rope until they are all tangled up in it. So for a consultant, or
new hire with experience the convention path is simpler, because one always
looks at specific things in a specific order that is already known. The
configuration path is only sometimes simpler for new users.

However, what I'd propose is that we have a precedence order for the
"levels" of configuration, and a single "source" for configuration if
we need to make that source configurable so be it, but all "primary
configuration" should come from a single source, for a given cluster.

To put it another way I'm not fond of any "fallback" in where config comes
from.

By "Primary configuration" I mean the solr specific xml/json/whatever ...
The "Primary Configuration" could of course point to resources required
elsewhere, but those should be things like jar files or SSO systems,
whereas the configuration artifacts that are solr specific should come from
one source.


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: [DISCUSS] ConfigSet ZK to file system fallback

2021-02-04 Thread Gus Heck
It sounds like the issue is that we need both a "per node config" and a
"per collection" config. This could all be in zookeeper, and with a clear
well documented precedence order (node wins) for any attributes that
overlap... would even make sense to have names for nodes that were not
literal machine urls for this so that one could move a node to a different
machine... node goes down, (listed as down by zookeeper) node comes up
claiming name, if the name is a down node, bingo new node gets the same
config as the old node. New node coming up and finding the name taken by a
live node could wait for N ticks before giving up or could fail immediately.

Node names could be supplied at startup, or assigned automatically...

Probably want to have a default node config, and the ability to write
configs for node names that don't (yet) exist...

Just a thought... sounds good to me because a view of ZK still shows you
all the configurations, zk is still the one source of truth. What I don't
want is multiple sources of truthishness.

On Thu, Feb 4, 2021 at 12:23 PM Tomás Fernández Löbbe 
wrote:

> > Ehh; I am not suggesting that configSets belong local, which would be a
> step backwards -- we put them in ZK for a reason right now :-)  I'm
> suggesting we have both for the same configSet, where the deployer can
> choose which element is node resident vs cluster/ZK resident.  Thanks to
> existing Solr features like configOverlay.json and/or XML xi:include plus
> one small addition of fallback resolution of configSet files from ZK to the
> local node, we'd get this ability.  (see my first email).
>
> To be clear, I didn't suggest we move all configsets to be local. I'm just
> saying that having a local configset has those issues I mentioned.
>
> The point I was trying to make is that, having a single configset loading
> from both, local and zk may be confusing for the user and cause issues that
> may be difficult to track: Which file is Solr really reading right now? is
> it the local one or the remote one? Is there a local one in a node or not?
> is it being correctly overridden? How do I ensure that I always have a
> local version of a file to override the remote?
>
> So, I'm thinking that if we want to support this feature, a cleaner
> approach could be to just have a type of configset that's defined as
> "local", and then it belongs to the local filesystem. We can just prevent a
> node from starting if it's supposed to have a configset that doesn't have.
> It's 100% clear where a config file is being read from, etc. Maybe the
> "configOverlay.json" is an exception and should live in ZooKeeper (and
> never locally) for the config API to work, but having just "default to
> local when a file is not in ZooKeeper" just confuses things IMO.
>
> On Tue, Jan 26, 2021 at 8:38 PM David Smiley  wrote:
>
>> On Tue, Jan 26, 2021 at 1:27 PM Tomás Fernández Löbbe <
>> tomasflo...@gmail.com> wrote:
>>
>>> Thanks for bringing this up, David. I thought about this same situation
>>> before, but I think I never convinced myself in one way or another :p. As I
>>> mentioned in many other emails, I think the infrastructure and the node
>>> configuration (such as solr.xml) needs to be local (at least, needs to be
>>> able to be local and not forced on ZooKeeper) for various reasons.
>>>
>>
>> I agree 100%.  I think the key part there is having *choice* for each
>> configuration element, and not one dictated by Solr as to what belongs
>> where.  The implementation of it needn't be complicated; it's a
>> straight-forward idea to have the same format with conceptual layer /
>> aggregation of them.
>>
>>
>>> The same reasons exist for configsets: safe upgrades, or possible
>>> node-specific configuration, as you mentioned. But Configsets have another
>>> layer of complexity in my mind, which is, you don't know where you'll need
>>> them... because you don't (necessarily) know where replicas of a collection
>>> are going to be created. True that this is not a problem in the Docker
>>> image situation you are describing, or if handled with care, but how can
>>> Solr make sure of it?
>>>
>>
>> Ehh; I am not suggesting that configSets belong local, which would be a
>> step backwards -- we put them in ZK for a reason right now :-)  I'm
>> suggesting we have *both* for the same configSet, where the deployer can
>> choose which element is node resident vs cluster/ZK resident.  Thanks to
>> existing Solr features like configOverlay.json and/or XML xi:include plus
>> one small addition of fallback resolution of configSet files from ZK to the
>> local node, we'd get this ability.  (see my first email).
>>
>> We have a very limited ability to accomplish the broad idea today -- Java
>> system properties with variable substitution in our files.  But of course
>> it's very limited what you can do with that, and it feels abusive to push
>> it too far.  It's fine for individual tunables (e.g. an integer) but not
>> more aggregate things like a complete 

Re: [DISCUSS] ConfigSet ZK to file system fallback

2021-01-23 Thread Gus Heck
I'm in agreement with Eric here that fewer ways (or at least a clearer
default way) of supplying resources would be better. Additionally, it
should be easy to specify that this resource that I've shared should be
loaded on a per SolrCore or per node basis (or even better per collection
present on the node, accessible under a standard name to replicas belonging
to that collection?). Not many cases beyond the simplest single collection
install few shards where you want a 1GB resource to be duplicated in memory
across N cores running on the same node, though obviously there's ample
cases where the 10k stop words file is meant to differ across collections.

As it stands Eric's list seems like something that should be in the
documentation somewhere just so people can properly troubleshoot where
something they don't expect to be loaded is getting loaded from, or why
their attempts to load something new aren't working...  especially if it
were ordered to show the precedence of these options.

As for ease of editing configurations, I've long felt that this should be
possible via the admin UI though there's been much worry about security
implications there. Personally, I think that those concerns are resolvable,
but have not found time to make that case. Aside from that I think we need
to support tooling to enable easy management of config sets rather than
expanding the possible number of places the configurations might get loaded
from.

Several years ago I wrote a plugin for gradle that is very very basic, but
after some configuration so that it can see zookeeper, it will happily pull
configs down and push them up for you which is convenient for keeping
configs under version control during development. There's LOTS to improve
there, most especially adding support to manage multiple configs at a time,
and I had hoped that folks would use it and have suggestions,
contributions, but I've got no indication that anyone but me uses it. (
https://github.com/nsoft/solr-gradle)

-Gus

On Fri, Jan 22, 2021 at 8:19 AM Eric Pugh 
wrote:

> There is a lot in here ;-).
>
> With the caveat that I don’t have recent experience that many of you do
> with massive solr clusters, I think that we need to commit to fewer, not
> more, ways of maintaining the supporting resources that these clusters
> need..   I’d like to see ways of managing our Solr clusters that encourage
> easy change and experimentation, and encourage us to separate the physical
> layer (version of Solr, networking setup, packages used) from the logical
> layer (individual collections and their supporting code and resources).
>
> I think the configSet was a huge jump forward..   My workflow is to think
> 1) What’s unusual about this Solr setup?  What is the physical layer need
> to be?  Special package?  Special code?   Build a Docker image.
> 2) Fire up a three node Solr cluster, wait till it’s up and responsive via
> checking APIs.
> 3) Now think about my specific use case.   What collections do I need?  Is
> it just 1, or is it 5 or 10 collections.  Are they on the same configSet or
> different.   Great, zip up the configSet and pop it into Solr via APIs.
> 4) Create the collections in the shapes I need with the APIs, and now
> start iterating on what I need to do.  Use the APIs to create fields, or
> set up different ParamSets.
>
> However, with configSets we only did half the job, because we still don’t
> have a single well understood way of handling Jars and other resources.  We
> have many ways of doing it.   Which generates constant user confusion and
> contributes to the perspective that “Solr is hard to use”.
>
> Right now, across the Solr landscape I can think of many ways of adding
> “external” files to my Solr:
>
> 1) Classic ./lib as a place to put things.
> 2) The new to me solr.allow.unsafe.resourceloading=true approach
> 3) The userfiles directory in Solr accessed by streaming expressions load
> function.
> 4) The “package store” for packages located in file store
> 5) The blob store .system concept from before the package store
> 6) the LTR feature store (which I guess is backed by ZK but could be on
> the disk as well through more hoops...
> 7) Layering stuff in directly via Docker build files
>
> These are each a little different, with varying levels of support.
>
> Let’s figure out how we can include a resource that is 10 KB, 1 MB or 1 GB
> and not have to think about ZooKeeper or any of the other implementation
> details of backing that.Let’s figure out where the package manager is
> letting us down and keep working on it.
>
>
>
> On Jan 22, 2021, at 12:16 AM, David Smiley  wrote:
>
> Summary:  I've been contemplating a simple enhancement to how SolrCloud
> resolves files in a configSet:  when a file isn't in ZooKeeper, fallback
> resolution to the same-named configset on the file system (which normally
> is ignored in SolrCloud today).  A further fallback to _default on the
> filesystem could be useful as well. The mutable space is always 

Re: Consider Removing the `@` Special Character from RegExp

2021-01-22 Thread Gus Heck
I think it's already an optional feature; if you construct the regexp with
explicit syntax flags you can get an instance that won't consider '@'
special. Haven't actually had a need to do that so I'm assuming it works as
documented.

/** Syntax flag, enables anystring (@). */
public static final int ANYSTRING = 0x0008;



On Thu, Jan 21, 2021 at 9:21 PM Marcus Eagan  wrote:

> Hi All,
>
> In looking at the Java Docs, our Lucene team noticed that the `@` symbol
> is a reserved character in the Lucene regular expression syntax.
>
> In re-visiting the page in curiosity, I found that the symbol was
> [Optional] for "any string." This came at a surprise because there's a very
> common way to achieve "any string" in `.*`. Is there any compelling reason
> to preserve this tiny vector of complexity? I suspect there may be some
> differences in the constructions of the finite automata produced by `.*`
> and `@` but I am not sure.
>
> If insignificant or non-existent, I suggest we remove `@` from the regular
> expression syntax.
>
> --
> Marcus Eagan
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Old programmers do fade away

2020-12-31 Thread Gus Heck
Good luck and enjoy the welder. I did some welding many years ago, and it's
quite a cool thing to turn two bits of metal into a single piece. Hopefully
you've not had to learn about the UV emissions given off by the arc by
getting a good solid sunburn the way I did :).

The community will certainly miss your input. I've found reading your
responses to folks and your blog posts quite helpful.

As for squirrels, they are living proof that time on task matters... they
have 24/7 to consider how to get to that feeder/garden/whatever, and the
ultimate motivation. Most humans put a few hours a week into keeping them
out, hence the frequency with which they win.

Best of luck and enjoy!

-Gus

On Thu, Dec 31, 2020 at 9:04 AM Vincenzo D'Amore  wrote:

> Hi Erick, I want just to say thank you for your help.
> You have been one of the most present and reliable voices to listen in
> the Community.
> Thanks again for all your help and support, I wish you all the best.
>
> On Wed, Dec 30, 2020 at 3:09 PM Erick Erickson 
> wrote:
>
>> 40 years is enough. OK, it's only been 39 1/2 years. Dear Lord, has it
>> really been that long? Programming's been fun, I've gotten to solve puzzles
>> every day. The art and science of programming has changed over that time.
>> Let me tell you about the joys of debugging with a Z80 stack emulator that
>> required that you to look on the stack for variables and trace function
>> calls by knowing how to follow frame pointers. Oh the tedium! Oh the (lack
>> of) speed! Not to mention that 64K of memory was all you had to work with.
>> I had a co-worker who could predict the number of bytes by which the
>> program would shrink based on extracting common code to functions. The
>> "good old days"...weren't...
>>
>> I'd been thinking that I'd treat Lucene/Solr as a hobby, doing occasional
>> work on it when I was bored over long winter nights. I've discovered,
>> though, that I've been increasingly reluctant to crack open the code. I
>> guess that after this much time, I'm ready to hang up my spurs. One major
>> factor is the realization that there's so much going on with Lucene/Solr
>> that simply being aware of the changes, much less trying to really
>> understand them, isn't something I can do casually.
>>
>> I bought a welder and find myself more interested in playing with that
>> than programming. Wait until you see the squirrel-proof garden enclosure
>> I'm building with it. If my initial plan doesn't work, next up is an
>> electric fence along the top. The laser-sighted automatic machine gun
>> emplacement will take more planning...Ahhh, probably won't be able to get a
>> permit from the township for that though. Do you think the police would
>> notice? Perhaps I should add that the local police station is two blocks
>> away and in the line of fire. But an infrared laser powerful enough to
>> "pre-cook" them wouldn't be as obvious would it?
>>
>> Why am I so fixated on squirrels? One of the joys of gardening is fresh
>> tomatoes rather than those red things they sell in the store. The squirrels
>> ATE EVERY ONE OF MY TOMATOES WHILE THEY WERE STILL GREEN LAST YEAR! And the
>> melons. In the words of B. Bunny: "Of course you realize this means war" (
>> https://www.youtube.com/watch?v=4XNr-BQgpd0)...
>>
>> Then there's working in the garden and landscaping, the desk I want to
>> build for my wife, travel as soon as I can, maybe seeing if some sailboats
>> need crew...you get the idea.
>>
>> It's been a privilege to work with this group, you're some of the best
>> and brightest. Many thanks to all who've generously given me their time and
>> guidance. It's been a constant source of amazement to me how willing people
>> are to take time out of their own life and work to help me when I've had
>> questions. I owe a lot of people beers ;)
>>
>> I'll be stopping my list subscriptions, Slack channels (dm me if you need
>> something), un-assigning any JIRAs and that kind of thing over the next
>> while. If anyone's interested in taking over the BadApple report, let me
>> know and I can put the code up somewhere. It takes about 10 minutes to do
>> each week. I won't disappear entirely, things like the code-reformatting
>> effort are nicely self-contained for instance and something I can to
>> casually.
>>
>> My e-mail address if you need to get in touch with me is: "
>> erick.erick...@gmail.com". There's a correlation between gmail addresses
>> that are just a name with no numbers and a person's age... A co-worker came
>> over to my desk in pre-historical times and said "there's this new mail
>> service you might want to sign up for"... Like I said, 40 years is enough.
>>
>> Best to all,
>> Erick
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>
> --
> Vincenzo D'Amore
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com 

Re: Code reformatting

2020-12-24 Thread Gus Heck
+1 to always braces

On Thu, Dec 24, 2020 at 11:47 AM Michael Sokolov  wrote:

> The Google convention you cited says this, I think?
>
> >Braces are used with if, else, for, do and while statements, even
> when the body is empty or contains only a single statement.
>
> On Thu, Dec 24, 2020 at 8:00 AM Dawid Weiss  wrote:
> >
> > > Personally I would ban the non block conditional, but I think it's
> moot in this context since spotless just does what it does and is not
> configurable, as I understand it. I suppose we could manually "fix" all the
> conditionals though?
> >
> > I'm pretty sure you could do it automatically... But in many places
> > there is very little sense in doing that. That google format
> > convention [1] is fairly reasonable to me - strict in certain aspects
> > and relaxed elsewhere. I wouldn't enforce it. If you find a place that
> > could use more clarity with braces, correct it (and re-run the
> > formatting) then commit it back in.
> >
> > Dawid
> >
> > https://google.github.io/styleguide/javaguide.html
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Welcome Houston Putman to the PMC

2020-12-01 Thread Gus Heck
Congratations :)

On Tue, Dec 1, 2020, 9:53 PM Martin Gainty  wrote:

> congrats Houston
> martin​
>
> --
> *From:* Yonik Seeley 
> *Sent:* Tuesday, December 1, 2020 7:35 PM
> *To:* Solr/Lucene Dev 
> *Subject:* Re: Welcome Houston Putman to the PMC
>
> Congrats Houston!
> -Yonik
>
>
> On Tue, Dec 1, 2020 at 4:19 PM Mike Drob  wrote:
>
> I am pleased to announce that Houston Putman has accepted the PMC's
> invitation to join.
>
> Congratulations and welcome, Houston!
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: Welcome Julie Tibshirani as Lucene/Solr committer

2020-11-18 Thread Gus Heck
Congratulations and welcome :)

On Wed, Nov 18, 2020 at 10:56 AM Houston Putman 
wrote:

> Congrats and welcome Julie!!
>
> - Houston
>
> On Wed, Nov 18, 2020 at 10:30 AM Eric Pugh <
> ep...@opensourceconnections.com> wrote:
>
>> I’ve seen all your contributions, really great stuff. Welcome!
>>
>>
>> On Nov 18, 2020, at 10:22 AM, Uwe Schindler  wrote:
>>
>> Welcome Julie!
>>
>> -
>> Uwe Schindler
>> Achterdiek 19, D-28357 Bremen
>> https://www.thetaphi.de
>> eMail: u...@thetaphi.de
>>
>> -Original Message-
>> From: Michael Sokolov 
>> Sent: Wednesday, November 18, 2020 4:07 PM
>> To: dev@lucene.apache.org
>> Subject: Welcome Julie Tibshirani as Lucene/Solr committer
>>
>> I'm pleased to announce that Julie Tibshirani has accepted the PMC's
>> invitation to become a committer.
>>
>> Julie, the tradition is that new committers introduce themselves with
>> a brief bio.
>>
>> I think we may still be sorting out the details of your Apache account
>> (julie@ may have been taken?), but as soon as that has been sorted out
>> and karma has been granted, you can use your new powers to add
>> yourself to the committers section of the Who We Are page on the
>> website: 
>>
>> Congratulations and welcome!
>>
>> Mike Sokolov
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>> ___
>> *Eric Pugh **| *Founder & CEO | OpenSource Connections, LLC | 434.466.1467
>> | http://www.opensourceconnections.com | My Free/Busy
>> 
>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed
>> 
>> This e-mail and all contents, including attachments, is considered to be
>> Company Confidential unless explicitly stated otherwise, regardless
>> of whether attachments are marked as such.
>>
>>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Lucene Query Parser Syntax Specification

2020-11-12 Thread Gus Heck
I have had this thought regarding IDE support too. I've had expressions
that when formatted for legibility are over 100 lines long, and adding
something in the middle that changes indenting is truly painful at that
point. At the moment I've got several irons in the fire already and can't
possibly take that on. The current implementation
(org.apache.solr.client.solrj.io.stream.expr.StreamExpressionParser) is
hand coded, and not generated from a grammar. So one would probably want to
correct that first so that syntax changes can be identified and adjusted in
downstream syntax highlighters relatively easily. Unfortunately when I
looked at this for Intellij briefly Intellij is favoring antlr, but javacc
and jflex are what we tend to use in the solr codebase.

-Gus

On Thu, Nov 12, 2020 at 7:02 AM ufuk yılmaz 
wrote:

> I wish something like this existed for streaming expressions.
>
> To have highlighting and validation in an editor would be great!
>
>
>
> Sent from Mail  for
> Windows 10
>
>
>
> *From: *Scott Guthery 
> *Sent: *11 November 2020 23:54
> *To: *dev@lucene.apache.org
> *Subject: *Re: Lucene Query Parser Syntax Specification
>
>
>
> >> The source code is the de-facto specification
>
>
>
> Fair enough although it does beg the question of which parser source code,
> there being no shortage of Lucene/Solr/etc. query parsers, parser releases,
> and parser versions at github.  Anyway, below is my de jure yacc.  I think
> it covers everything in the 2012 specification and rounds out the special
> cases a little.
>
>
>
> Your comments are solicited and will be greatly appreciated.
>
>
>
> Cheers, Scott
>
>
>
> P.S.  yacc/bison can generate parsers in programming languages other than
> C including Java.
>
>
>
> query : query TOK_AND query
>   | query TOK_OR query
>   | TOK_NOT query
>   |  '('  query ')'
>   | term
> term:
>
> TOK_ALPHA   |
>
> TOK_WILD|
> TOK_ALPHA ':' TOK_ALPHA |
> TOK_ALPHA ':' TOK_WILD  |
> TOK_ALPHA '~' |
> TOK_ALPHA '~' TOK_NUM |
> TOK_ALPHA '^' TOK_NUM |
> TOK_ALPHA ':' TOK_ALPHA '~'   |
> TOK_ALPHA ':' TOK_ALPHA '~' TOK_NUM  |
> TOK_ALPHA ':' TOK_ALPHA '^' TOK_NUM  |
> '"' TOK_ALPHA TOK_ALPHA '"' '~' TOK_NUM  |
> TOK_ALPHA ':' '[' TOK_NUM TOK_TO TOK_NUM ']' |
> TOK_ALPHA ':' '{' TOK_ALPHA TOK_TO TOK_ALPHA '}' |
> '+'TOK_ALPHA  |
> '-'TOK_ALPHA
>
>
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: 8.8 section in the changelog?

2020-10-18 Thread Gus Heck
That is only for master, because I was intending on merging back after the
branch so all aqp related features would stay together and to not drop it
in 8.7 with no time for it to get used/validated by the motivating use case.

On Fri, Oct 16, 2020, 1:07 PM Adrien Grand  wrote:

> Hello,
>
> I'm confused that master now has a non-empty 8.8 section in the Changelog
> while branch_8_7 has not been cut yet, was it done by mistake?
>
> --
> Adrien
>


Re: Index documents in async way

2020-10-13 Thread Gus Heck
This is interesting, though it opens a few of cans of worms IMHO.

   1. Currently we now guarantee that if solr sends you an OK response the
   document WILL eventually become searchable without further action.
   Maintaining that guarantee becomes impossible if we haven't verified that
   the data is formatted correctly (i.e. dates are in ISO format, etc).
   This may be an acceptable cost for those opting for async indexing but it
   may be very hard for some folks to swallow if it became the only option
   however.
   2. In the case of errors we need to hold the error message indefinitely
   for later discovery by the client, this needs to not accumulate forever.
   Thus:
  1. We have a timed cleanup, leasing or some other self limiting
  pattern... possibly by indexing the failures in a TRA with autodelete so
  that clients can efficiently find the status of the particular
document(s)
  they sent, obviouysly there's at least an asyc id involved, probably the
  uniqueKey (where available) and timestamps for recieved, and processed as
  well.
  2. We log more simply with a sequential id and let clients keep track
  of what they have seen... This can lead us down the path of re-inventing
  kafka, or making kafka a required dependency.
  3. We provide a push oriented connection (websocket? HTTP2?) that
  clients that care about failures can listen to and store nothing. A less
  appetizing variant is to publish errors to a message bus.
   3. If we have more than one thread picking up the submitted documents
   and writing them, we need a state machine that identifies in-progress
   documents to prevent multiple pickups and resets processing to new on
   startup to ensure we don't index the same document twice and don't lose
   things that were in-flight on power loss.
   4. Backpressure/throttling. If we're losing ground continuously on the
   submissions because indexing is heavier than accepting documents, we may
   fill up the disk. Of course the index itself can do that, but need to think
   about if this makes it worse.

A big plus to this however is that batches with errors could optionally
just omit the (one or two?) errored document(s) and publish the error for
each errored document rather than failing the whole batch, meaning that the
indexing infrastructure submitting in batches doesn't have to leave several
hundred docs unprocessed, or alternately do a slow doc at a time resubmit
to weed out the offenders.

Certainly the involvement of kafka sounds interesting. If one persists to
an externally addressable location like a kafka queue one might leave the
option for the write-on-receipt queue to be different from the
read-to-actually-index queue and put a pipeline behind solr instead of
infront of it... possibly atomic updates could then be given identical
processing as initial indexing

On Sat, Oct 10, 2020 at 12:41 AM David Smiley  wrote:

>
>
> On Thu, Oct 8, 2020 at 10:21 AM Cao Mạnh Đạt  wrote:
>
>> Hi guys,
>>
>> First of all it seems that I used the term async a lot recently :D.
>> Recently I have been thinking a lot about changing the current indexing
>> model of Solr from sync way like currently (user submit an update request
>> waiting for response). What about changing it to async model, where nodes
>> will only persist the update into tlog then return immediately much like
>> what tlog is doing now. Then we have a dedicated executor which reads from
>> tlog to do indexing (producer consumer model with tlog acting like the
>> queue).
>>
>
> The biggest problem I have with this is that the client doesn't know about
> indexing problems without awkward callbacks later to see if something went
> wrong.  Even simple stuff like a schema problem (e.g. undefined field).
> It's a useful *option*, any way.
>
>
>>
>> I do see several big benefits of this approach
>>
>>- We can batching updates in a single call, right now we do not use
>>writer.add(documents) api from lucene, by batching updates this gonna 
>> boost
>>the performance of indexing
>>
>> I'm a bit skeptical that would boost indexing performance.  Please
> understand the intent of that API is about transactionality (atomic add)
> and ensuring all docs go in the same segment.  Solr *does* use that API for
> nested / parent-child documents, and because it has to.  If that API were
> to get called for normal docs, I could see the configured indexing buffer
> RAM or doc limit could be exceeded substantially.  Perhaps not a big deal.
> You could test your performance theory on a hacked Solr without much
> modifications, I think?  Just buffer then send in bulk.
>
>>
>>- One common problems with Solr now is we have lot of threads doing
>>indexing so that can ends up with many small segments. Using this model we
>>can have bigger segments so less merge cost
>>
>> This is app/use-case dependent of course.  If you observe the segment
> count to be high, I think it's more 

Re: QueryParser - proposed change may break existing queries.

2020-09-17 Thread Gus Heck
And as I understand it, current behavior is the silent misinterpretation.
To me, the failure to require a space after the regex (and either not
become a regex in that case or complain about invalid regex) might be
considered a bug...

On Thu, Sep 17, 2020 at 9:30 AM Mark Harwood  wrote:

> I think the decision comes down to choosing between silent
> (mis)interpratations of ambiguous queries or noisy failures..
>
> On Thu, Sep 17, 2020 at 1:55 PM Uwe Schindler  wrote:
>
>> Hi,
>>
>>
>>
>> My idea would have been not to bee too strict and instead only detect it
>> as a regex if its separated. So /foo/bar and /foo/iphone would both go
>> through and ignoring the regex, only ‘/foo/ bar’ or ‘/foo/I phone’ would
>> interpret the first token as regex.
>>
>>
>>
>> That’s just my idea, not sure if it makes sense to have this relaxed
>> parsing. I was always very skeptical of adding the regexes, as it breaks
>> many queries. Now it’s even more.
>>
>>
>>
>> Uwe
>>
>>
>>
>> -
>>
>> Uwe Schindler
>>
>> Achterdiek 19, D-28357 Bremen
>>
>> https://www.thetaphi.de
>>
>> eMail: u...@thetaphi.de
>>
>>
>>
>> *From:* Mark Harwood 
>> *Sent:* Wednesday, September 16, 2020 6:45 PM
>> *To:* dev@lucene.apache.org
>> *Subject:* Re: QueryParser - proposed change may break existing queries.
>>
>>
>>
>> The strictness I was thinking of adding was to make all of the following
>> error:
>>
>>  /foo/bar
>>
>>  /foo//bar/
>>
>>  /foo/iphone
>>
>>  /foo/AND x
>>
>>
>>
>> These would be allowed:
>>
>>  /foo/i bar
>>
>>  (/foo/ OR /bar/)
>>
>>  (/foo/ OR /bar/i)
>>
>>  /foo/^2
>>
>>  /foo/i^2
>>
>>
>>
>>
>>
>>
>>
>> On 16 Sep 2020, at 12:00, Uwe Schindler  wrote:
>>
>> 
>>
>> In my opinion, the proposed syntax change should enforce to have
>> whitespace or any other separator chat after the regex “i” parameter.
>>
>>
>>
>> Uwe
>>
>>
>>
>> -
>>
>> Uwe Schindler
>>
>> Achterdiek 19, D-28357 Bremen
>>
>> https://www.thetaphi.de
>>
>> eMail: u...@thetaphi.de
>>
>>
>>
>> *From:* Mark Harwood 
>> *Sent:* Wednesday, September 16, 2020 11:04 AM
>> *To:* dev@lucene.apache.org
>> *Subject:* QueryParser - proposed change may break existing queries.
>>
>>
>>
>> In Lucene-9445 we'd like to add a case insensitive option to regex
>> queries in the query parser of the form:
>>
>>/Foo/i
>>
>>
>>
>> However, today people can search for :
>>
>>
>>
>>/foo.com/index.html
>>
>>
>>
>> and not get an error. The searcher may think this is a query for a URL
>> but it's actually parsed as a regex "foo.com" ORed with a term query.
>>
>>
>>
>> I'd like to draw attention to this proposed change in behaviour because I
>> think it could affect many existing systems. Arguably it may be a positive
>> in drawing attention to a number of existing silent failures (unescaped
>> searches for urls or file paths) but equally could be seen as a negative
>> breaking change by some.
>>
>>
>>
>> What is our BWC policy for changes to query parser?
>>
>> Do the benefits of the proposed new regex feature outweigh the costs of
>> the breakages in your view?
>>
>>
>>
>>
>> https://issues.apache.org/jira/browse/LUCENE-9445?focusedCommentId=17196793=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17196793
>>
>>
>>
>>
>>
>>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: 8.7 Release

2020-09-15 Thread Gus Heck
Unless it somehow got lost in a spam filter somewhere, I don't think we
have set a target date for the release yet? (roadmap says autumn 2020 which
technically doesn't begin until the solstice on the 21st :) )

Hoping that I might still get the Advanced Query parser in first, but
that's a much bigger prospect than these two tickets.

-Gus

On Tue, Sep 15, 2020 at 5:29 PM Erik Hatcher  wrote:

> Unless there are objections, I'm gonna get
> https://issues.apache.org/jira/browse/SOLR-14799 into 8.7 as well.
>
> Erik
>
>
> On Sep 14, 2020, at 10:06 AM, Christine Poerschke (BLOOMBERG/ LONDON) <
> cpoersc...@bloomberg.net> wrote:
>
> With a view towards including it in the release, I'd appreciate input on
> the
>
> https://issues.apache.org/jira/browse/SOLR-14828
>
> solrj logging tweak if anyone has a moment?
>
> Thanks,
> Christine
>
> From: dev@lucene.apache.org At: 08/20/20 22:48:39
> To: dev@lucene.apache.org
> Subject: Re: 8.7 Release
>
> Also, we should try to respect the stuff we have put on the roadmap (Which
> includes me getting a patch up for SIP-9 much sooner rather than even a
> little later!)
>
> On Thu, Aug 20, 2020 at 5:18 PM Adrien Grand  wrote:
>
>> Thanks for the explanation Ishan.
>>
>> On Thu, Aug 20, 2020 at 10:33 PM Ishan Chattopadhyaya <
>> ichattopadhy...@gmail.com> wrote:
>>
>>> Hi Adrien,
>>> I think I am mainly concerned about getting the configuration and
>>> modularity of this right before we release:
>>> https://issues.apache.org/jira/browse/SOLR-14588.
>>> If we aren't able to resolve it, we should revert that feature.
>>>
>>> There may be some other performance issues that may have been marked as
>>> blockers just to infuse a sense of urgency among those that need to fix it.
>>> But, I wouldn't consider them something that actually holds up a release.
>>> Regards,
>>> Ishan
>>>
>>> On Fri, Aug 21, 2020 at 1:56 AM Adrien Grand  wrote:
>>>
 Noble, I'm curious what blockers you have in mind. I just checked JIRA,
 and while I see a number of 9.0 blockers, I'm not counting many 8.7
 blockers?

 On Thu, Aug 20, 2020 at 11:13 AM Noble Paul 
 wrote:

> There are a lot of blockers for 8.7. It's good to plan in advance
>
> On Thu, Aug 20, 2020 at 7:11 PM Ishan Chattopadhyaya
>  wrote:
> >
> > Hi devs,
> > A lot of changes are now in 8.7 or in-flight. I'd like to volunteer
> for a 8.7 release in around a month from now (cutting the release branch
> around 20 September) and RC shortly after. I feel this timeline will give
> all of us ample time to wrap up the release blockers, other changes and
> improvements.
> >
> > Does someone have any thoughts, concerns or objections?
> > Regards,
> > Ishan
> >
>
>
> --
> -
> Noble Paul
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

 --
 Adrien

>>>
>>
>> --
>> Adrien
>>
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>
>
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Avoiding false-positives in multivalued field search with intervals?

2020-09-11 Thread Gus Heck
You're thinking of SurroundQuery parser for span queries I think...
https://lucene.apache.org/solr/guide/8_6/other-parsers.html#surround-query-parser
and the Advanced Query Parser will have a similar syntax

On Thu, Sep 10, 2020 at 4:40 PM Michael Sokolov  wrote:

> A slightly different but related topic is how to manage lots of fields
>
> I agree that sub-fields are a pain and that mashing everything
> together in an all-field is a mess, but for best performance with a
> large number of fields/sub-fields, it is the only workable option I
> can see? Expanding a query over numerous fields grows combinatorically
> in the number of fields (if I want my query to match when all terms
> match in *some* field), doesn't it?
>
> I would like to see a mechanism for defining sub-fields using
> positions. Together with an absolute positional query this would
> enable both match-any-field as well as field-specific matching with
> each token indexed only once (multi-values are possible within this
> with boundary tokens or big enough position ranges, as Alan
> suggested). It does mean that the sub-field boundaries have to be
> managed somehow. Without index support, you can set an arbitrary large
> size for your sub-field and insert position gaps at the boundaries,
> but maybe we could detect the largest sub-field at flush time and
> write that metadata somewhere in the index to enable smaller gaps?
> Another issue is differing analysis for the sub-fields, and properly
> updating the positions during analysis: at the boundaries(you don't
> want to insert a gap, rather advance to a fixed position, and you have
> to index sub-fields in order. Maybe we could make it less horrible by
> adding better support for it.
>
> Re: query parsing; wasn't there at one time an interval query parser?
> It had operators like w() and n() IIRC
>
> On Thu, Sep 10, 2020 at 4:20 PM Dawid Weiss  wrote:
> >
> > > Ok so the more general question is whether we need an interval query
> parser
> >
> > Oh, to this I'd say: yes, yes, yes.
> >
> > I didn't have much prior experience writing frontend apps on top of
> > Solr/Lucene but once I did have
> > to go that route it quickly turns out that several things that are
> > readily available from code-level
> > are so darn difficult to achieve and integrate from the outside.
> Specifically:
> >
> > - Field expansion in query parsers is a must (so that unqualified
> > terms are expanded over multiple fields).
> > Any query parser that doesn't support this is in my opinion of zero
> > use. The "default" copy-to sink field known
> > from Solr brings more problems than it solves.
> >
> > - Exact match-region hit highlighting is a strong expectation. I
> > solved this with matches API (see LUCENE-9461)
> > and flexible query parser's multifield expansion. Works like a charm.
> >
> > - Multivalued fields are common and sub-document handling is a pain.
> > The problem I raised here is a result of
> > direct user feedback. In real life multivalued fields are omnipresent
> > and searches over those fields can be complex.
> > Users see hits that just should not be there and are confused.
> >
> > - People do use complex queries. Maybe not all people but there are
> > people out there who do... Just recently I extended
> > flexible query parser with a handcrafted min-should-match operator
> > because it is otherwise not accessible in any Lucene
> > query parser (!). I can make this code available (it's not terribly
> > complex), although, since you asked, I think a query parser that
> > exposes all sorts of "higher level" functionality of intervals would
> > be very, very useful.
> >
> > It may end up that I'll have to write something for intervals anyway
> > so we can work on this together if you like.
> > Especially the syntax is an open question - should it be
> > operator-based (like the current boost of fuzzy operators) or
> > meta-function-based (so that pseudo-functions would be available). Or
> > maybe a mix of both? I don't know, really. :)
> >
> > Dawid
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


PayloadDecoder.FLOAT_DECODER

2020-09-04 Thread Gus Heck
In reviewing SOLR-14787  ,
with the author verbally, he pointed out there's some duplicated code where
he needed to decode a payload into a float. We
have org.apache.lucene.analysis.payloads.PayloadHelper#decodeFloat(byte[],
int)  but lucene/queries doesn't have lucene/analysis-common on the
classpath and he found no better option than to essentially cut and paste
those methods into his code. Neither of us likes that solution very much
because code duplication is not cool, so I went fishing for other options
or examples in the code base (on 8.x) and ran into
PayloadDecoder.FLOAT_DECODER... which looks quite broken, or at least
wildly different:

PayloadDecoder FLOAT_DECODER = bytes -> bytes == null ? 1 :
bytes.bytes[bytes.offset];

this coerces a single byte to a float, but analysis common is encoding via
PayloadHelper like this:

  public static byte[] encodeFloat(float payload, byte[] data, int offset){
return encodeInt(Float.floatToIntBits(payload), data, offset);
  }

Which can clearly write more than one byte... and decoding via

  public static final float decodeFloat(byte [] bytes, int offset){
return Float.intBitsToFloat(decodeInt(bytes, offset));
  }

  public static final int decodeInt(byte [] bytes, int offset){
return ((bytes[offset] & 0xFF) << 24) | ((bytes[offset + 1] & 0xFF) <<
16)
 | ((bytes[offset + 2] & 0xFF) <<  8) |  (bytes[offset + 3] & 0xFF);
  }

Which is clearly expecting 4 bytes... and nothing like FLOAT_DECODER

The scary thing is PayloadDecoder.FLOAT_DECODER seems to be used in
BoostingTermBuilder... and a whole bunch of unit tests, which I presume are
passing on coincidence of handling floats that happen to qualify as small
value integers but I haven't tested that theory yet.

PayloadDecoder.FLOAT_DECODER has been around for a long time, so I thought
I'd solicit opinions before attempting to clean it up. My concept of a
cleanup here would include trying to find a home for the float decoding
logic (from PayloadHelper) that can be seen by both areas and unifying
encode/decode for float payloads to a single location, and also add a unit
test that round trips the encode/decode cycle (which I'm not seeing).

-Gus


http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: [VOTE] Lucene logo contest, third time's a charm

2020-09-01 Thread Gus Heck
A2, A1, D (binding)

On Tue, Sep 1, 2020 at 7:37 PM Varun Thacker  wrote:

> (non-binding)
> vote: A1, A2, D
>
> On Tue, Sep 1, 2020 at 1:21 PM Ryan Ernst  wrote:
>
>> Dear Lucene and Solr developers!
>>
>> Sorry for the multiple threads. This should be the last one.
>>
>> In February a contest was started to design a new logo for Lucene
>> [jira-issue]. The initial attempt [first-vote] to call a vote resulted in
>> some confusion on the rules, as well the request for one additional
>> submission. The second attempt [second-vote] yesterday had incorrect links
>> for one of the submissions. I would like to call a new vote, now with more
>> explicit instructions on how to vote, and corrected links.
>>
>> *Please read the following rules carefully* before submitting your vote.
>>
>> *Who can vote?*
>>
>> Anyone is welcome to cast a vote in support of their favorite
>> submission(s). Note that only PMC member's votes are binding. If you are a
>> PMC member, please indicate with your vote that the vote is binding, to
>> ease collection of votes. In tallying the votes, I will attempt to verify
>> only those marked as binding.
>>
>>
>> *How do I vote?*
>> Votes can be cast simply by replying to this email. It is a ranked-choice
>> vote [rank-choice-voting]. Multiple selections may be made, where the order
>> of preference must be specified. If an entry gets more than half the votes,
>> it is the winner. Otherwise, the entry with the lowest number of votes is
>> removed, and the votes are retallied, taking into account the next
>> preferred entry for those whose first entry was removed. This process
>> repeats until there is a winner.
>>
>> The entries are broken up by variants, since some entries have multiple
>> color or style variations. The entry identifiers are first a capital
>> letter, followed by a variation id (described with each entry below), if
>> applicable. As an example, if you prefer variant 1 of entry A, followed by
>> variant 2 of entry A, variant 3 of entry C, entry D, and lastly variant 4e
>> of entry B, the following should be in your reply:
>>
>> (binding)
>> vote: A1, A2, C3, D, B4e
>>
>> *Entries*
>>
>> The entries are as follows:
>>
>> A*.* Submitted by Dustin Haver. This entry has two variants, A1 and A2.
>>
>> [A1]
>> https://issues.apache.org/jira/secure/attachment/12999548/Screen%20Shot%202020-04-10%20at%208.29.32%20AM.png
>> [A2]
>> https://issues.apache.org/jira/secure/attachment/12997172/LuceneLogo.png
>>
>> B. Submitted by Stamatis Zampetakis. This has several variants. Within
>> the linked entry there are 7 patterns and 7 color palettes. Any vote for B
>> should contain the pattern number followed by the lowercase letter of the
>> color palette. For example, B3e or B1a.
>>
>> [B]
>> https://issues.apache.org/jira/secure/attachment/12997768/zabetak-1-7.pdf
>>
>> C. Submitted by Baris Kazar. This entry has 8 variants.
>>
>> [C1]
>> https://issues.apache.org/jira/secure/attachment/13006392/lucene_logo1_full.pdf
>> [C2]
>> https://issues.apache.org/jira/secure/attachment/13006393/lucene_logo2_full.pdf
>> [C3]
>> https://issues.apache.org/jira/secure/attachment/13006394/lucene_logo3_full.pdf
>> [C4]
>> https://issues.apache.org/jira/secure/attachment/13006395/lucene_logo4_full.pdf
>> [C5]
>> https://issues.apache.org/jira/secure/attachment/13006396/lucene_logo5_full.pdf
>> [C6]
>> https://issues.apache.org/jira/secure/attachment/13006397/lucene_logo6_full.pdf
>> [C7]
>> https://issues.apache.org/jira/secure/attachment/13006398/lucene_logo7_full.pdf
>> [C8]
>> https://issues.apache.org/jira/secure/attachment/13006399/lucene_logo8_full.pdf
>>
>> D. The current Lucene logo.
>>
>> [D]
>> https://lucene.apache.org/theme/images/lucene/lucene_logo_green_300.png
>>
>> Please vote for one of the above choices. This vote will close about one
>> week from today, Mon, Sept 7, 2020 at 11:59PM.
>>
>> Thanks!
>>
>> [jira-issue] https://issues.apache.org/jira/browse/LUCENE-9221
>> [first-vote]
>> http://mail-archives.apache.org/mod_mbox/lucene-dev/202006.mbox/%3cCA+DiXd74Mz4H6o9SmUNLUuHQc6Q1-9mzUR7xfxR03ntGwo=d...@mail.gmail.com%3e
>> [second-vote]
>> http://mail-archives.apache.org/mod_mbox/lucene-dev/202009.mbox/%3cCA+DiXd7eBrQu5+aJQ3jKaUtUTJUqaG2U6o+kUZfNe-m=smn...@mail.gmail.com%3e
>> [rank-choice-voting] https://en.wikipedia.org/wiki/Instant-runoff_voting
>>
>

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


  1   2   3   4   5   6   7   8   9   10   >