Re: 2020-03 Committer virtual meeting

2020-03-12 Thread David Smiley
I've chosen Friday March 20th at 11am US eastern time, which is when
everyone who responded to the Doodle poll said they could make it.  Closer
to the event I'll share a Google Hangout URL on Slack #lucene-dev.  This
gathering is virtual and so no coronavirus cancellation will happen ;-)

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Mar 10, 2020 at 5:27 PM David Smiley 
wrote:

> Hello fellow committers,
>
> I'd like to organize another virtual Lucene/Solr committer meeting this
> month.  I created a meeting notes page in confluence here:
> https://cwiki.apache.org/confluence/display/LUCENE/2020-03+Meeting+notes
> It has some topics I'd like to talk about and I'm hoping others might add
> to the tentative agenda as well.  Maybe we'll keep it to an hour this time;
> last time it was 95 minutes.
>
> When exactly is this?:  Next week sometime. I'm using a "Doodle poll" to
> determine an optimal time slot.  For the link to the poll, go to the ASF
> Slack, #lucene-dev channel, and you will see it.  You could also email me
> directly for it.
>
> For this virtual committer meeting and future ones:
>
>- This is in the spirit of committer meetings co-located with
>conferences.  ASF policy says that no "decisions" can be made in such a
>venue.  We make decisions on this dev list and indirectly via JIRA out in
>the open and with the opportunity for anyone to comment.
>- Who:  Committer-only or by invitation
>- Video chat with option of audio dial-in.  This time I will use
>Google Hangout.
>- Recorded for those invited only.  I'll dispose of the recording a
>week after.  The intention is for those who cannot be there due to a
>scheduling conflict to see/hear what was said.  I have the ability to do
>this recording via Salesforce's G-Suite subscription.
>- Published notes:  I (or someone) will take written meeting notes
>that are ultimately published for anyone to see (not restricted to those
>invited).  They will be transmitted to the dev list.
>
>
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>


Re: 8.5 release

2020-03-12 Thread Andrzej Białecki
Hi Alan,

Is there still time to merge a fix for SOLR-13264? It’s a bug that makes it 
impossible to customise this trigger, you can only config a trigger with its 
default operations.

> On 12 Mar 2020, at 17:24, Alan Woodward  wrote:
> 
> While I wait for the smoke tester to finish, I’ve been working on release 
> notes.  The ReleaseTodo still refers to the old wiki, and release notes are 
> on CWiki now, so I’m flying slightly blind.  Looking at what was done for the 
> previous release, I’ve created a draft note for lucene which can be inspected 
> and edited here:
> 
> https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=148641343=d4d3acb9-0dd6-4d40-903c-b16f2bb68415=shareui=1584025014586
>  
> 
> 
> For Solr, the 8.4 release note on CWiki points to a section on 
> https://lucene.apache.org/solr/news.html 
>   but it’s not entirely clear where 
> this section has come from or where it should be drafted.  Can anybody 
> enlighten me?
> 
>> On 11 Mar 2020, at 09:20, Alan Woodward > > wrote:
>> 
>> Sure, go ahead
>> 
>>> On 10 Mar 2020, at 19:22, David Smiley >> > wrote:
>>> 
>>> Can I assume it's no big deal to post a solr-ref-guide documentation 
>>> improvement on the release branch irrespective of whenever you precisely do 
>>> the RC?
>>> 
>>> ~ David Smiley
>>> Apache Lucene/Solr Search Developer
>>> http://www.linkedin.com/in/davidwsmiley 
>>> 
>>> 
>>> On Tue, Mar 10, 2020 at 9:15 AM Joel Bernstein >> > wrote:
>>> I just updated solr/CHANGES.txt as I missed something. If you've already 
>>> created the RC then it will be there in case of a respin.
>>> 
>>> 
>>> 
>>> Joel Bernstein
>>> http://joelsolr.blogspot.com/ 
>>> 
>>> 
>>> On Tue, Mar 10, 2020 at 5:45 AM Ignacio Vera >> > wrote:
>>> done. Thank you!
>>> 
>>> On Tue, Mar 10, 2020 at 10:43 AM Alan Woodward >> > wrote:
>>> Go ahead, I’ll start the release build once it’s in.
>>> 
 On 10 Mar 2020, at 07:26, Ignacio Vera >>> > wrote:
 
 Hi Alanm
 
 Is it  possible to backport 
 https://issues.apache.org/jira/browse/LUCENE-9263 
  for the 8.5 release, I 
 push it tester day and CI is happy.
 
 Thanks,
 
 On Tue, Mar 10, 2020 at 2:35 AM Joel Bernstein >>> > wrote:
 
 Finished the backport for https://issues.apache.org/jira/browse/SOLR-14073 
 .
 
 Thanks!
 
 
 Joel Bernstein
 http://joelsolr.blogspot.com/ 
 
 
 On Mon, Mar 9, 2020 at 8:44 AM Joel Bernstein >>> > wrote:
 Ok, I'll do the backport today. Thanks!
 
 Joel Bernstein
 http://joelsolr.blogspot.com/ 
 
 
 On Mon, Mar 9, 2020 at 6:21 AM Alan Woodward >>> > wrote:
 Thanks Uwe!
 
> On 7 Mar 2020, at 10:06, Uwe Schindler  > wrote:
> 
> Hi,
>  
> FYI, I cleaned, renamed, and changed the Jenkins Jobs, so the 8.5 branch 
> is in the loop on ASF Jenkins and Policeman Jenkins.
>  
> Uwe
>  
> -
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de 
> eMail: u...@thetaphi.de 
>  
> From: Alan Woodward mailto:romseyg...@gmail.com>> 
> Sent: Wednesday, March 4, 2020 5:35 PM
> To: dev@lucene.apache.org 
> Subject: Re: 8.5 release
>  
> I’ve created a branch for the 8.5 release (`branch_8_5`) and pushed it to 
> the apache repository.  We’re now at feature freeze, so only bug fixes 
> should be pushed to the branch.
>  
> I can see from 
> https://issues.apache.org/jira/issues/?jql=project%20in%20(SOLR%2C%20LUCENE)%20AND%20status%20in%20(Open%2C%20Reopened)%20AND%20priority%20%3D%20Blocker%20AND%20fixVersion%20%3D%208.5%20ORDER%20BY%20priority%20DESC
>  
> 
>  that we have 4 tickets marked as Blockers for this release.  I plan to 
> build a first release candidate next Monday, which gives us a few days to 
> resolve these.  If that’s not going to be long enough, please let me know.
>  
> Uwe, Steve, can one of you start the Jenkins 

RE: 8.5 release

2020-03-12 Thread Staley, Phil R - DCF
What is the estimated ETA for 8.5 download?

From: Alan Woodward 
Sent: Thursday, March 12, 2020 11:25 AM
To: dev@lucene.apache.org
Subject: Re: 8.5 release

While I wait for the smoke tester to finish, I’ve been working on release 
notes.  The ReleaseTodo still refers to the old wiki, and release notes are on 
CWiki now, so I’m flying slightly blind.  Looking at what was done for the 
previous release, I’ve created a draft note for lucene which can be inspected 
and edited here:

https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=148641343=d4d3acb9-0dd6-4d40-903c-b16f2bb68415=shareui=1584025014586

For Solr, the 8.4 release note on CWiki points to a section on 
https://lucene.apache.org/solr/news.html
  but it’s not entirely clear where this section has come from or where it 
should be drafted.  Can anybody enlighten me?


On 11 Mar 2020, at 09:20, Alan Woodward 
mailto:romseyg...@gmail.com>> wrote:

Sure, go ahead


On 10 Mar 2020, at 19:22, David Smiley 
mailto:david.w.smi...@gmail.com>> wrote:

Can I assume it's no big deal to post a solr-ref-guide documentation 
improvement on the release branch irrespective of whenever you precisely do the 
RC?

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Mar 10, 2020 at 9:15 AM Joel Bernstein 
mailto:joels...@gmail.com>> wrote:
I just updated solr/CHANGES.txt as I missed something. If you've already 
created the RC then it will be there in case of a respin.



Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Mar 10, 2020 at 5:45 AM Ignacio Vera 
mailto:iver...@gmail.com>> wrote:
done. Thank you!

On Tue, Mar 10, 2020 at 10:43 AM Alan Woodward 
mailto:romseyg...@gmail.com>> wrote:
Go ahead, I’ll start the release build once it’s in.


On 10 Mar 2020, at 07:26, Ignacio Vera 
mailto:iver...@gmail.com>> wrote:

Hi Alanm

Is it  possible to backport 
https://issues.apache.org/jira/browse/LUCENE-9263
 for the 8.5 release, I push it tester day and CI is happy.

Thanks,

On Tue, Mar 10, 2020 at 2:35 AM Joel Bernstein 
mailto:joels...@gmail.com>> wrote:

Finished the backport for 
https://issues.apache.org/jira/browse/SOLR-14073.

Thanks!


Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, Mar 9, 2020 at 8:44 AM Joel Bernstein 
mailto:joels...@gmail.com>> wrote:
Ok, I'll do the backport today. Thanks!

Joel Bernstein

Re: 8.5 release

2020-03-12 Thread Alan Woodward
While I wait for the smoke tester to finish, I’ve been working on release 
notes.  The ReleaseTodo still refers to the old wiki, and release notes are on 
CWiki now, so I’m flying slightly blind.  Looking at what was done for the 
previous release, I’ve created a draft note for lucene which can be inspected 
and edited here:

https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=148641343=d4d3acb9-0dd6-4d40-903c-b16f2bb68415=shareui=1584025014586
 


For Solr, the 8.4 release note on CWiki points to a section on 
https://lucene.apache.org/solr/news.html 
  but it’s not entirely clear where 
this section has come from or where it should be drafted.  Can anybody 
enlighten me?

> On 11 Mar 2020, at 09:20, Alan Woodward  wrote:
> 
> Sure, go ahead
> 
>> On 10 Mar 2020, at 19:22, David Smiley > > wrote:
>> 
>> Can I assume it's no big deal to post a solr-ref-guide documentation 
>> improvement on the release branch irrespective of whenever you precisely do 
>> the RC?
>> 
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley 
>> 
>> 
>> On Tue, Mar 10, 2020 at 9:15 AM Joel Bernstein > > wrote:
>> I just updated solr/CHANGES.txt as I missed something. If you've already 
>> created the RC then it will be there in case of a respin.
>> 
>> 
>> 
>> Joel Bernstein
>> http://joelsolr.blogspot.com/ 
>> 
>> 
>> On Tue, Mar 10, 2020 at 5:45 AM Ignacio Vera > > wrote:
>> done. Thank you!
>> 
>> On Tue, Mar 10, 2020 at 10:43 AM Alan Woodward > > wrote:
>> Go ahead, I’ll start the release build once it’s in.
>> 
>>> On 10 Mar 2020, at 07:26, Ignacio Vera >> > wrote:
>>> 
>>> Hi Alanm
>>> 
>>> Is it  possible to backport 
>>> https://issues.apache.org/jira/browse/LUCENE-9263 
>>>  for the 8.5 release, I 
>>> push it tester day and CI is happy.
>>> 
>>> Thanks,
>>> 
>>> On Tue, Mar 10, 2020 at 2:35 AM Joel Bernstein >> > wrote:
>>> 
>>> Finished the backport for https://issues.apache.org/jira/browse/SOLR-14073 
>>> .
>>> 
>>> Thanks!
>>> 
>>> 
>>> Joel Bernstein
>>> http://joelsolr.blogspot.com/ 
>>> 
>>> 
>>> On Mon, Mar 9, 2020 at 8:44 AM Joel Bernstein >> > wrote:
>>> Ok, I'll do the backport today. Thanks!
>>> 
>>> Joel Bernstein
>>> http://joelsolr.blogspot.com/ 
>>> 
>>> 
>>> On Mon, Mar 9, 2020 at 6:21 AM Alan Woodward >> > wrote:
>>> Thanks Uwe!
>>> 
 On 7 Mar 2020, at 10:06, Uwe Schindler >>> > wrote:
 
 Hi,
  
 FYI, I cleaned, renamed, and changed the Jenkins Jobs, so the 8.5 branch 
 is in the loop on ASF Jenkins and Policeman Jenkins.
  
 Uwe
  
 -
 Uwe Schindler
 Achterdiek 19, D-28357 Bremen
 https://www.thetaphi.de 
 eMail: u...@thetaphi.de 
  
 From: Alan Woodward mailto:romseyg...@gmail.com>> 
 Sent: Wednesday, March 4, 2020 5:35 PM
 To: dev@lucene.apache.org 
 Subject: Re: 8.5 release
  
 I’ve created a branch for the 8.5 release (`branch_8_5`) and pushed it to 
 the apache repository.  We’re now at feature freeze, so only bug fixes 
 should be pushed to the branch.
  
 I can see from 
 https://issues.apache.org/jira/issues/?jql=project%20in%20(SOLR%2C%20LUCENE)%20AND%20status%20in%20(Open%2C%20Reopened)%20AND%20priority%20%3D%20Blocker%20AND%20fixVersion%20%3D%208.5%20ORDER%20BY%20priority%20DESC
  
 
  that we have 4 tickets marked as Blockers for this release.  I plan to 
 build a first release candidate next Monday, which gives us a few days to 
 resolve these.  If that’s not going to be long enough, please let me know.
  
 Uwe, Steve, can one of you start the Jenkins tasks for the new branch?
  
 Thanks, Alan
 
 
> On 3 Mar 2020, at 14:50, Alan Woodward  > wrote:
>  
> PSA: I’ve had to generate a new GPG key for this release, and it takes a 
> while for it to get mirrored to the lucene KEYS file.  I’ll hold off 
> cutting the branch until everything is ready, so it will probably now be 
> tomorrow UK 

Re: [jira] [Commented] (LUCENE-8929) Early Terminating CollectorManager

2020-03-12 Thread Erick Erickson
Brief comment (without reading all the comments, so FWIW) on segment merging 
and sort order. The default TMP will not guarantee anything; the “cost” of a 
merge takes into account a number of things. I _think_ the index will tend 
towards retaining the insertion order in an insert-only index, but by no means 
is that guaranteed (or even checked).

That said, if it’s an insert-only index, extending TMP or creating an 
“InsertOnlyMergePolicy” would be pretty easy. You’d want to borrow a few 
concepts from TMP when it comes to merging “like sized” segments rather than 
rewriting hugely dissimilar segments, and maybe make the segment max larger, 
but it should be pretty straight-forward. You’d also want to order the actual 
merge once the list of segments to merge was determined, which I’m not 
absolutely sure happens (no evidence one way or the other).

All this may be totally irrelevant, and for all I know there’s already one at 
the Lucene level, haven’t checked.

FWIW,
Erick

> On Mar 12, 2020, at 09:29, Michael Sokolov (Jira)  wrote:
> 
> 
>[ 
> https://issues.apache.org/jira/browse/LUCENE-8929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057967#comment-17057967
>  ] 
> 
> Michael Sokolov commented on LUCENE-8929:
> -
> 
> Thanks for the insightful comments, [~jim.ferenczi], you've given me a lot to 
> think about! I had not really considered sorting segments: that makes a lot 
> of sense when documents are at least roughly inserted in sort order. I would 
> have thought merges would interfere with that opto, but I guess for the most 
> part it works out? The performance improvements you saw are stunning. It 
> would be great if we could get the segment sorting ideas merged into the 
> Lucene code base, no? I wonder how we determine when they are applicable 
> though. In Elasticsearch is it done based on some a-priori knowledge, or do 
> you analyze the distribution and turn on the opto automatically? That would 
> be compelling I think. On the other hand, the use case inspiring this does 
> not tend to correlate index sort order and insertion order, so I don't think 
> it would benefit as much from segment sorting (except due to chance, or in 
> special cases), so I think these are really two separate optimizations and 
> issues. We should be sure to structure the code in such a way that can 
> accomodate them all and properly choose which one to apply. We don't have a 
> formal query planner in Lucene, but I guess we are beginning to evolve one.
> 
> I think the idea of splitting collectors is a good one, to avoid overmuch 
> complexity in a single collector, but there is also a good deal of shared 
> code across these. I can give that a try and see what it looks like. 
> 
> By the way, I did also run a test using luceneutil's "modification timestamp" 
> field as the index sort and saw similar gains. I think that field is more 
> tightly correlated with insertion order, and also has much higher 
> cardinality, so it makes a good counterpoint: I'll post results here later 
> once I can do a workup.
> 
> I hear your concern about the non-determinism due to tie-breaking, but I * 
> think* this is accounted for by including (global) docid in the comparison in 
> MaxScoreTerminator.LeafState? I may be missing something though. It doesn't 
> seem we have a good unit test checking for this tiebreak. I'll add to 
> TestTopFieldCollector.testRandomMaxScoreTermination to make sure that case is 
> covered.
> 
> I'm not sure what to say about the `LeafFieldComparator` idea - it sounds 
> powerful, but I am also a bit leery of these complex Comparators - they make 
> other things more difficult since it becomes challenging to reason about the 
> sort order "from the outside". I had to resort to some "instanceof" hackery 
> to restrict consideration to cases where the comparator is numeric, and 
> extracting the sort value from the comparator is pretty messy too. We pay a 
> complexity cost here to handle some edge cases of more abstract comparators.  
> 
>> Early Terminating CollectorManager
>> --
>> 
>>Key: LUCENE-8929
>>URL: https://issues.apache.org/jira/browse/LUCENE-8929
>>Project: Lucene - Core
>> Issue Type: Sub-task
>>   Reporter: Atri Sharma
>>   Priority: Major
>> Time Spent: 7h 20m
>> Remaining Estimate: 0h
>> 
>> We should have an early terminating collector manager which accurately 
>> tracks hits across all of its collectors and determines when there are 
>> enough hits, allowing all the collectors to abort.
>> The options for the same are:
>> 1) Shared total count : Global "scoreboard" where all collectors update 
>> their current hit count. At the end of each document's collection, collector 
>> checks if N > threshold, and aborts if true
>> 2) State Reporting Collectors: Collectors report their total number of 
>> counts