[ANNOUNCE] Apache Lucene 8.11.3 released

2024-02-08 Thread Houston Putman
The Lucene PMC is pleased to announce the release of Apache Lucene 8.11.3.

Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for nearly
any application that requires full-text search, especially cross-platform.

This release contains numerous bug fixes, optimizations, and improvements,
some of which are highlighted below. The release is available for immediate
download at:

  

### Lucene 8.11.3 Release Highlights:

 * A number of bugs in polygon tessellating have been fixed.
 * GC Load during indexing has been reduced by estimating FST BysteStore
block size.
 * BKD trees will no longer possibly overflow when more than 4 billion
points are added.

Please read CHANGES.txt for a full list of changes:

  


[RESULT] [VOTE] Release Lucene/Solr 8.11.3 RC1

2024-02-08 Thread Houston Putman
It's been >72h since the vote was initiated and the result is:

+1  6  (6 binding)
 0  0
-1  0

This vote has PASSED


Re: [VOTE] Release Lucene/Solr 8.11.3 RC1

2024-02-08 Thread Houston Putman
It's been >72h since the vote was initiated and the result is:

+1  6  (6 binding)
 0  0
-1  0

This vote has PASSED

On Thu, Feb 8, 2024 at 1:52 PM Anshum Gupta  wrote:

> +1 (binding)
>
> SUCCESS! [1:20:38.669502]
>
> On Wed, Feb 7, 2024 at 10:27 AM Kevin Risden  wrote:
>
>> +1 (binding)
>>
>> SUCCESS! [1:05:24.985760]
>>
>> My issue was ANT_ARGS being set to color - fixed with `unset ANT_ARGS`
>> were ANT_ARGS was being set by oh-my-zsh
>> https://github.com/ohmyzsh/ohmyzsh/blob/master/plugins/ant/ant.plugin.zsh.
>> The colored output wouldn't match the regex for the backwards compat
>> testing.
>>
>> Kevin Risden
>>
>>
>> On Wed, Feb 7, 2024 at 8:24 AM Ishan Chattopadhyaya <
>> ichattopadhy...@gmail.com> wrote:
>>
>>> +1 (binding)
>>>
>>> SUCCESS! [1:18:24.494917]
>>>
>>> On Wed, 7 Feb 2024 at 18:24, Jan Høydahl  wrote:
>>>
 +1 (binding)

 SUCCESS! [1:18:11.930433]

 Only ran smoke tester. macOS, Temurin 1.8.0_402

 Jan

 5. feb. 2024 kl. 23:23 skrev Houston Putman :

 Please vote for release candidate 1 for Lucene/Solr 8.11.3

 The artifacts can be downloaded from:

 https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.11.3-RC1-revbaa7c80af4278cc8951a344d8e9320386588d12d

 You can run the smoke tester directly with this command:

 python3 -u dev-tools/scripts/smokeTestRelease.py \

 https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.11.3-RC1-revbaa7c80af4278cc8951a344d8e9320386588d12d

 The vote will be open for at least 72 hours i.e. until 2024-02-08 23:00
 UTC.

 [ ] +1  approve
 [ ] +0  no opinion
 [ ] -1  disapprove (and reason why)

 Here is my +1



>
> --
> Anshum Gupta
>


Re: [VOTE] Release Lucene/Solr 8.11.3 RC1

2024-02-08 Thread Anshum Gupta
+1 (binding)

SUCCESS! [1:20:38.669502]

On Wed, Feb 7, 2024 at 10:27 AM Kevin Risden  wrote:

> +1 (binding)
>
> SUCCESS! [1:05:24.985760]
>
> My issue was ANT_ARGS being set to color - fixed with `unset ANT_ARGS`
> were ANT_ARGS was being set by oh-my-zsh
> https://github.com/ohmyzsh/ohmyzsh/blob/master/plugins/ant/ant.plugin.zsh.
> The colored output wouldn't match the regex for the backwards compat
> testing.
>
> Kevin Risden
>
>
> On Wed, Feb 7, 2024 at 8:24 AM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
>
>> +1 (binding)
>>
>> SUCCESS! [1:18:24.494917]
>>
>> On Wed, 7 Feb 2024 at 18:24, Jan Høydahl  wrote:
>>
>>> +1 (binding)
>>>
>>> SUCCESS! [1:18:11.930433]
>>>
>>> Only ran smoke tester. macOS, Temurin 1.8.0_402
>>>
>>> Jan
>>>
>>> 5. feb. 2024 kl. 23:23 skrev Houston Putman :
>>>
>>> Please vote for release candidate 1 for Lucene/Solr 8.11.3
>>>
>>> The artifacts can be downloaded from:
>>>
>>> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.11.3-RC1-revbaa7c80af4278cc8951a344d8e9320386588d12d
>>>
>>> You can run the smoke tester directly with this command:
>>>
>>> python3 -u dev-tools/scripts/smokeTestRelease.py \
>>>
>>> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.11.3-RC1-revbaa7c80af4278cc8951a344d8e9320386588d12d
>>>
>>> The vote will be open for at least 72 hours i.e. until 2024-02-08 23:00
>>> UTC.
>>>
>>> [ ] +1  approve
>>> [ ] +0  no opinion
>>> [ ] -1  disapprove (and reason why)
>>>
>>> Here is my +1
>>>
>>>
>>>

-- 
Anshum Gupta


Re: Needs help reviewing on Lucene PostingsFormat memory improvement

2024-02-08 Thread Anh Dũng Bùi
Thanks Mike for the reply!

> Read-time for Lucene90BlockTreePostingsFormat was already off-heap?  And
your PR changes write-time to do so as well?

Yeah that's the idea. I changed just the Terms Writer to be off-heap.
Thanks, let's monitor it after the merge.

> Maybe building the synonyms FST (SynonymMap.Builder) would be a good
place for off-heap writing too?

This is a good idea. I see there's one on-going PR that tackles this
already: https://github.com/apache/lucene/pull/13054. I'm excited to see
the feature is rolling out to different parts of Lucene.

> And this exciting PR  (still
a work in progres) would likely strongly benefit from streaming FST
building, since its FSTs will be much larger than the Lucene90BlockTree
since it stores all terms (not just the sampled prefix/index) in a single
FST for the segment.

I can try to fork this PR and convert to off-heap writing as well.

Regards,
Anh Dung Bui

On Thu, Feb 8, 2024 at 7:43 AM Michael McCandless 
wrote:

> Hi Anh Dũng Bùi,
>
> Thank you for tackling these and being so gently patient/persisting!
> Sorry for the delay.  I will try to review them soon.  The off-heap
> (streaming?) building of FSTs is really a massive improvement to Lucene,
> inspired by Tantivy's FST implementation:
> https://blog.burntsushi.net/transducers/
>
> Read-time for Lucene90BlockTreePostingsFormat was already off-heap?  And
> your PR changes write-time to do so as well?  This will reduce RAM pressure
> during indexing which is great.  And some Lucene usages generate incredibly
> large FSTs (I'm looking at you HathiTrust!). I don't think we need to
> explicitly measure any performance impact before merging?, but let's watch
> the nightly benchy to see if there is any measurable impact?
>
> And, yes, Lucene90BlockTreePostingsFormat is the default.  You find the
> default codec from Codec.getDefault() and then trace downwards to all its
> sources.
>
> Maybe building the synonyms FST (SynonymMap.Builder) would be a good place
> for off-heap writing too?
>
> And this exciting PR  (still
> a work in progres) would likely strongly benefit from streaming FST
> building, since its FSTs will be much larger than the Lucene90BlockTree
> since it stores all terms (not just the sampled prefix/index) in a single
> FST for the segment.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Feb 1, 2024 at 10:40 PM Anh Dũng Bùi  wrote:
>
>> Hi Lucene devs!
>>
>> I have 2 PRs to optimize Lucene PostingsFormat
>> (Lucene90BlockTreePostingsFormat and FSTPostingsFormat) by utilizing a new
>> feature to stream the FST to IndexOutput directly, bypassing the on-heap
>> writing:
>> - https://github.com/apache/lucene/pull/12980
>> - https://github.com/apache/lucene/pull/12985
>>
>> It would be great if someone can help reviewing. I also have some general
>> questions:
>> - How do I measure the memory improvement impact in Lucene?
>> - Is Lucene90BlockTreePostingsFormat the main index format used in
>> Lucene? If not, what is the main format?
>> - Are there other places worth using the new streaming FST feature?
>>
>> Thank you!
>> Anh Dung Bui
>>
>


Re: Lucene 9.10

2024-02-08 Thread Uwe Schindler

Hi Adrien,

as discussed in the PR, I will merge the MMapDir and Panama Vector for 
JDK 22 later today or at latest tomorrow. I need to first download the 
RC version of JDK that is going to be released today and do the usual 
API consistency checks (checking no late API changes appeared).


So next Wednesday is perfectly fine.

Uwe

Am 07.02.2024 um 15:57 schrieb Adrien Grand:

Hello all,

It's been 2 months since we released 9.9 and we accumulated a good 
number of changes, so I'd like to propose that we release 9.10.0.


If there are no objections, I volunteer to be the release manager and 
suggest cutting the branch next Monday (February 12th) and starting 
the release process on Wednesday, one week from now (February 14th).


+Uwe Schindler  I remember that there are 
JDK22-related changes that you'd like to get into 9.10, feel free to 
let me know if this timeline doesn't work for you.


--
Adrien


--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail:u...@thetaphi.de


Re: Lucene 9.10

2024-02-08 Thread Michael McCandless
+1 to release 9.10.  Thank you for volunteering Adrien!

Mike McCandless

http://blog.mikemccandless.com


On Wed, Feb 7, 2024 at 9:57 AM Adrien Grand  wrote:

> Hello all,
>
> It's been 2 months since we released 9.9 and we accumulated a good number
> of changes, so I'd like to propose that we release 9.10.0.
>
> If there are no objections, I volunteer to be the release manager and
> suggest cutting the branch next Monday (February 12th) and starting the
> release process on Wednesday, one week from now (February 14th).
>
> +Uwe Schindler  I remember that there are JDK22-related
> changes that you'd like to get into 9.10, feel free to let me know if this
> timeline doesn't work for you.
>
> --
> Adrien
>