Re: Needs help reviewing on Lucene PostingsFormat memory improvement

2024-02-07 Thread Michael McCandless
Hi Anh Dũng Bùi,

Thank you for tackling these and being so gently patient/persisting!  Sorry
for the delay.  I will try to review them soon.  The off-heap (streaming?)
building of FSTs is really a massive improvement to Lucene, inspired by
Tantivy's FST implementation: https://blog.burntsushi.net/transducers/

Read-time for Lucene90BlockTreePostingsFormat was already off-heap?  And
your PR changes write-time to do so as well?  This will reduce RAM pressure
during indexing which is great.  And some Lucene usages generate incredibly
large FSTs (I'm looking at you HathiTrust!). I don't think we need to
explicitly measure any performance impact before merging?, but let's watch
the nightly benchy to see if there is any measurable impact?

And, yes, Lucene90BlockTreePostingsFormat is the default.  You find the
default codec from Codec.getDefault() and then trace downwards to all its
sources.

Maybe building the synonyms FST (SynonymMap.Builder) would be a good place
for off-heap writing too?

And this exciting PR  (still a
work in progres) would likely strongly benefit from streaming FST building,
since its FSTs will be much larger than the Lucene90BlockTree since it
stores all terms (not just the sampled prefix/index) in a single FST for
the segment.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Feb 1, 2024 at 10:40 PM Anh Dũng Bùi  wrote:

> Hi Lucene devs!
>
> I have 2 PRs to optimize Lucene PostingsFormat
> (Lucene90BlockTreePostingsFormat and FSTPostingsFormat) by utilizing a new
> feature to stream the FST to IndexOutput directly, bypassing the on-heap
> writing:
> - https://github.com/apache/lucene/pull/12980
> - https://github.com/apache/lucene/pull/12985
>
> It would be great if someone can help reviewing. I also have some general
> questions:
> - How do I measure the memory improvement impact in Lucene?
> - Is Lucene90BlockTreePostingsFormat the main index format used in Lucene?
> If not, what is the main format?
> - Are there other places worth using the new streaming FST feature?
>
> Thank you!
> Anh Dung Bui
>


[jira] [Commented] (PYLUCENE-65) Support the default java on debian in `setup.py`.

2024-02-07 Thread Andi Vajda (Jira)


[ 
https://issues.apache.org/jira/browse/PYLUCENE-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815420#comment-17815420
 ] 

Andi Vajda commented on PYLUCENE-65:


on my rather old debian install and also on ubuntu 22.04, I get these packages 
when I search for default-java

apt search default-java
Sorting... Done
Full Text Search... Done
libplexus-container-default-java/jammy 2.1.0-1 all
  Plexus Inversion-of-control Container

libplexus-container-default1.5-java/jammy 2.1.0-1 all
  Plexus Inversion-of-control Container (transitional package)

which java are you referring to ?
Does the fix you propose apply only to Debian and its derivatives ?

> Support the default java on debian in `setup.py`.
> -
>
> Key: PYLUCENE-65
> URL: https://issues.apache.org/jira/browse/PYLUCENE-65
> Project: PyLucene
>  Issue Type: Improvement
>Reporter: A. Coady
>Priority: Major
>
> On debian, the `default-java` package does not have `jre/lib/amd64` in its 
> path, so it breaks the `linux/x86_64` build. The `temurin` flags have the 
> correct paths, so one easy fix would be to change the [temurin 
> check|https://svn.apache.org/viewvc/lucene/pylucene/trunk/jcc/setup.py?revision=1900087=markup#l194]
>  to:
> {code:python}
>     if 'temurin' in JDK['linux'] or 'default' in JDK['linux']:
> {code}
> That would also support `linux/aarch64` without any further changes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PYLUCENE-69) Linking libjvm seems to prevent PyPi upload

2024-02-07 Thread Andi Vajda (Jira)


[ 
https://issues.apache.org/jira/browse/PYLUCENE-69?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815416#comment-17815416
 ] 

Andi Vajda commented on PYLUCENE-69:


Try with the latest but I doubt you'll be able to build a wheel that includes 
the JVM.

> Linking libjvm seems to prevent PyPi upload
> ---
>
> Key: PYLUCENE-69
> URL: https://issues.apache.org/jira/browse/PYLUCENE-69
> Project: PyLucene
>  Issue Type: Question
>Reporter: Clément Jonglez
>Priority: Major
>
> As mentioned in 
> [https://issues.apache.org/jira/projects/PYLUCENE/issues/PYLUCENE-68] , I am 
> trying to package the Orekit Python wrapper from [~petrush] (which uses JCC) 
> to PyPi.
> I used the --wheel option to compile a wheel and not an egg, and I tried to 
> upload the wheel to PyPi.
> But PyPi refuses my wheel with the answer :
> {noformat}
> Binary wheel 'orekit-11.3.3-cp312-cp312-linux_x86_64.whl' has an unsupported 
> platform tag 'linux_x86_64'. {noformat}
> After reading about why this error occurs ( 
> [https://peps.python.org/pep-0513/#rationale] )  , I tried to convert the 
> wheel to a manylinux wheel by using:
>  
> {code:java}
> auditwheel repair dist/orekit-11.3.3-cp312-cp312-linux_x86_64.whl{code}
>  
> Which returned the following error:
>  
> {noformat}
> auditwheel: error: cannot repair 
> "dist/orekit-11.3.3-cp312-cp312-linux_x86_64.whl" to "manylinux_2_5_x86_64" 
> ABI because of the presence of too-recent versioned symbols. You'll need to 
> compile the wheel on an older toolchain.{noformat}
>  
> I then ran the following command to get more information about which symbols 
> are problematic in the wheel:
>  
> {code:java}
> auditwheel-symbols --manylinux 2_34 
> dist/orekit-11.3.3-cp312-cp312-linux_x86_64.whl{code}
>  
> Which returned:
>  
> {noformat}
> orekit/_orekit.cpython-312-x86_64-linux-gnu.so is not manylinux_2_34 
> compliant because it links the following forbidden libraries:
> libjvm.so{noformat}
>  
>  
> I tried removing -ljvm from the LFLAGS list in jcc's config.py, but as 
> expected the Python program then fails on starting the JVM:
> {noformat}
> Traceback (most recent call last):
>   File "/media/ssd/git/orekit_python_artifacts/test/AbstractDetectorTest.py", 
> line 3, in 
>     import orekit
>   File 
> "/home/yzokras/Documents/orekit-pip/orekit312/lib/python3.12/site-packages/orekit-12.0-py3.12-linux-x86_64.egg/orekit/__init__.py",
>  line 7, in 
>     from . import _orekit
> ImportError: 
> /home/yzokras/Documents/orekit-pip/orekit312/lib/python3.12/site-packages/orekit-12.0-py3.12-linux-x86_64.egg/orekit/_orekit.cpython-312-x86_64-linux-gnu.so:
>  undefined symbol: JNI_GetDefaultJavaVMInitArgs{noformat}
>  
> I don't have any more clues... Alternatively, I could try to package the 
> Orekit Python wrapper as a source distribution (using 
> [https://issues.apache.org/jira/projects/PYLUCENE/issues/PYLUCENE-27] and 
> [https://issues.apache.org/jira/projects/PYLUCENE/issues/PYLUCENE-68] ), but 
> this would mean that a user would need to wait approx. 10 minutes for the 
> wheel to compile when running pip install...
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (PYLUCENE-68) setup.py install and easy_install command are deprecated

2024-02-07 Thread Andi Vajda (Jira)


 [ 
https://issues.apache.org/jira/browse/PYLUCENE-68?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andi Vajda resolved PYLUCENE-68.

Resolution: Fixed

Fixed in HEAD.

> setup.py install and easy_install command are deprecated
> 
>
> Key: PYLUCENE-68
> URL: https://issues.apache.org/jira/browse/PYLUCENE-68
> Project: PyLucene
>  Issue Type: Improvement
>Reporter: Clément Jonglez
>Priority: Major
>
> When compiling with jcc to create a wheel, I get the following warnings (see 
> below).
> This is still working but eventually the `setup.py` installation and 
> `easy_install` command will be removed.
> It would be nice to adapt JCC to modern pypa build tools, but I don't know 
> enough about JCC to do these changes.
>  
> {noformat}
> /home/yzokras/Documents/orekit-pip/orekit312/lib/python3.12/site-packages/setuptools/_distutils/dist.py:947:
>  SetuptoolsDeprecationWarning: setup.py install is deprecated.
> !!
>         
> 
>         Please avoid running ``setup.py`` directly.
>         Instead, use pypa/build, pypa/installer or other
>         standards-based tools.
>         See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html 
> for details.
>         
> 
> !!
> /home/yzokras/Documents/orekit-pip/orekit312/lib/python3.12/site-packages/setuptools/_distutils/cmd.py:66:
>  SetuptoolsDeprecationWarning: setup.py install is deprecated.
> !!
>         
> 
>         Please avoid running ``setup.py`` directly.
>         Instead, use pypa/build, pypa/installer or other
>         standards-based tools.
>         See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html 
> for details.
>         
> 
> !!
>   self.initialize_options()
> /home/yzokras/Documents/orekit-pip/orekit312/lib/python3.12/site-packages/setuptools/_distutils/cmd.py:66:
>  EasyInstallDeprecationWarning: easy_install command is deprecated.
> !!
>         
> 
>         Please avoid running ``setup.py`` and ``easy_install``.
>         Instead, use pypa/build, pypa/installer or other
>         standards-based tools.
>         See https://github.com/pypa/setuptools/issues/917 for details.
>         
> 
> !!
>   self.initialize_options()
>  
> {noformat}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (PYLUCENE-68) setup.py install and easy_install command are deprecated

2024-02-07 Thread Andi Vajda (Jira)


[ 
https://issues.apache.org/jira/browse/PYLUCENE-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815414#comment-17815414
 ] 

Andi Vajda edited comment on PYLUCENE-68 at 2/7/24 7:51 PM:


I looked into this issue a bit more now and setup.py is _not_ deprecated, see 
this 
[doc|https://packaging.python.org/en/latest/discussions/setup-py-deprecated/] 
for more about this topic.
What is deprecated are _some_ of the commands usually passed to setup.py, such 
as install, which JCC uses, in some cases, when it invokes setup() directly (at 
the bottom of python.py).
I fixed this now by adding support for a new --generate flag that supercedes 
--build and other such flags and, instead of calling setup() JCC produces a 
setup.py file for the extension. This setup.py file can then be used with 
modern python packaging tools such as build and pip.
The pylucene build is being switched to this and the process to build PyLucene 
is then:
  - build and install jcc using the modern tools:
  - python -m build
  - python -m pip install --force

  - build pylucene:
  - set MODERN_PACKAGING to true in Makefile
  - make all
 which runs:
  - jcc is invoked with --generate
  - python -m build -nw
  - python -m pip install --force



was (Author: vajda):
I looked into this issue a bit more now and setup.py is _not_ deprecated, see 
this 
[doc|https://packaging.python.org/en/latest/discussions/setup-py-deprecated/] 
for more about this topic.
What is deprecated are _some_ of the commands usually passed to setup.py, such 
as install, which JCC uses, in some cases, when it invokes setup() directly (at 
the bottom of python.py).
I fixed this now by adding support for a new --generate flag that supercedes 
--build and other such flags and, instead of calling setup() JCC produces a 
setup.py file for the extension. This setup.py file can then be used with 
modern python packaging tools such as build and pip.
The pylucene build is being switched to this and the process to build PyLucene 
is then:
  - build and install jcc using the modern tools:
  - python -m build
  - python -m pip install --force
  - build pylucene:
  - set MODERN_PACKAGING to true in Makefile
  - make all
 which runs:
  - jcc is invoked with --generate
  - python -m build -nw
  - python -m pip install --force


> setup.py install and easy_install command are deprecated
> 
>
> Key: PYLUCENE-68
> URL: https://issues.apache.org/jira/browse/PYLUCENE-68
> Project: PyLucene
>  Issue Type: Improvement
>Reporter: Clément Jonglez
>Priority: Major
>
> When compiling with jcc to create a wheel, I get the following warnings (see 
> below).
> This is still working but eventually the `setup.py` installation and 
> `easy_install` command will be removed.
> It would be nice to adapt JCC to modern pypa build tools, but I don't know 
> enough about JCC to do these changes.
>  
> {noformat}
> /home/yzokras/Documents/orekit-pip/orekit312/lib/python3.12/site-packages/setuptools/_distutils/dist.py:947:
>  SetuptoolsDeprecationWarning: setup.py install is deprecated.
> !!
>         
> 
>         Please avoid running ``setup.py`` directly.
>         Instead, use pypa/build, pypa/installer or other
>         standards-based tools.
>         See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html 
> for details.
>         
> 
> !!
> /home/yzokras/Documents/orekit-pip/orekit312/lib/python3.12/site-packages/setuptools/_distutils/cmd.py:66:
>  SetuptoolsDeprecationWarning: setup.py install is deprecated.
> !!
>         
> 
>         Please avoid running ``setup.py`` directly.
>         Instead, use pypa/build, pypa/installer or other
>         standards-based tools.
>         See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html 
> for details.
>         
> 
> !!
>   self.initialize_options()
> /home/yzokras/Documents/orekit-pip/orekit312/lib/python3.12/site-packages/setuptools/_distutils/cmd.py:66:
>  EasyInstallDeprecationWarning: easy_install command is deprecated.
> !!
>         
> 
>         Please avoid running ``setup.py`` and ``easy_install``.
>         Instead, use pypa/build, pypa/installer or other
>         standards-based tools.
>         See https://github.com/pypa/setuptools/issues/917 for details.
>         
> 

[jira] [Commented] (PYLUCENE-68) setup.py install and easy_install command are deprecated

2024-02-07 Thread Andi Vajda (Jira)


[ 
https://issues.apache.org/jira/browse/PYLUCENE-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815414#comment-17815414
 ] 

Andi Vajda commented on PYLUCENE-68:


I looked into this issue a bit more now and setup.py is _not_ deprecated, see 
this 
[doc|https://packaging.python.org/en/latest/discussions/setup-py-deprecated/] 
for more about this topic.
What is deprecated are _some_ of the commands usually passed to setup.py, such 
as install, which JCC uses, in some cases, when it invokes setup() directly (at 
the bottom of python.py).
I fixed this now by adding support for a new --generate flag that supercedes 
--build and other such flags and, instead of calling setup() JCC produces a 
setup.py file for the extension. This setup.py file can then be used with 
modern python packaging tools such as build and pip.
The pylucene build is being switched to this and the process to build PyLucene 
is then:
  - build and install jcc using the modern tools:
  - python -m build
  - python -m pip install --force
  - build pylucene:
  - set MODERN_PACKAGING to true in Makefile
  - make all
 which runs:
  - jcc is invoked with --generate
  - python -m build -nw
  - python -m pip install --force


> setup.py install and easy_install command are deprecated
> 
>
> Key: PYLUCENE-68
> URL: https://issues.apache.org/jira/browse/PYLUCENE-68
> Project: PyLucene
>  Issue Type: Improvement
>Reporter: Clément Jonglez
>Priority: Major
>
> When compiling with jcc to create a wheel, I get the following warnings (see 
> below).
> This is still working but eventually the `setup.py` installation and 
> `easy_install` command will be removed.
> It would be nice to adapt JCC to modern pypa build tools, but I don't know 
> enough about JCC to do these changes.
>  
> {noformat}
> /home/yzokras/Documents/orekit-pip/orekit312/lib/python3.12/site-packages/setuptools/_distutils/dist.py:947:
>  SetuptoolsDeprecationWarning: setup.py install is deprecated.
> !!
>         
> 
>         Please avoid running ``setup.py`` directly.
>         Instead, use pypa/build, pypa/installer or other
>         standards-based tools.
>         See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html 
> for details.
>         
> 
> !!
> /home/yzokras/Documents/orekit-pip/orekit312/lib/python3.12/site-packages/setuptools/_distutils/cmd.py:66:
>  SetuptoolsDeprecationWarning: setup.py install is deprecated.
> !!
>         
> 
>         Please avoid running ``setup.py`` directly.
>         Instead, use pypa/build, pypa/installer or other
>         standards-based tools.
>         See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html 
> for details.
>         
> 
> !!
>   self.initialize_options()
> /home/yzokras/Documents/orekit-pip/orekit312/lib/python3.12/site-packages/setuptools/_distutils/cmd.py:66:
>  EasyInstallDeprecationWarning: easy_install command is deprecated.
> !!
>         
> 
>         Please avoid running ``setup.py`` and ``easy_install``.
>         Instead, use pypa/build, pypa/installer or other
>         standards-based tools.
>         See https://github.com/pypa/setuptools/issues/917 for details.
>         
> 
> !!
>   self.initialize_options()
>  
> {noformat}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Release Lucene/Solr 8.11.3 RC1

2024-02-07 Thread Kevin Risden
+1 (binding)

SUCCESS! [1:05:24.985760]

My issue was ANT_ARGS being set to color - fixed with `unset ANT_ARGS` were
ANT_ARGS was being set by oh-my-zsh
https://github.com/ohmyzsh/ohmyzsh/blob/master/plugins/ant/ant.plugin.zsh.
The colored output wouldn't match the regex for the backwards compat
testing.

Kevin Risden


On Wed, Feb 7, 2024 at 8:24 AM Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:

> +1 (binding)
>
> SUCCESS! [1:18:24.494917]
>
> On Wed, 7 Feb 2024 at 18:24, Jan Høydahl  wrote:
>
>> +1 (binding)
>>
>> SUCCESS! [1:18:11.930433]
>>
>> Only ran smoke tester. macOS, Temurin 1.8.0_402
>>
>> Jan
>>
>> 5. feb. 2024 kl. 23:23 skrev Houston Putman :
>>
>> Please vote for release candidate 1 for Lucene/Solr 8.11.3
>>
>> The artifacts can be downloaded from:
>>
>> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.11.3-RC1-revbaa7c80af4278cc8951a344d8e9320386588d12d
>>
>> You can run the smoke tester directly with this command:
>>
>> python3 -u dev-tools/scripts/smokeTestRelease.py \
>>
>> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.11.3-RC1-revbaa7c80af4278cc8951a344d8e9320386588d12d
>>
>> The vote will be open for at least 72 hours i.e. until 2024-02-08 23:00
>> UTC.
>>
>> [ ] +1  approve
>> [ ] +0  no opinion
>> [ ] -1  disapprove (and reason why)
>>
>> Here is my +1
>>
>>
>>


Lucene 9.10

2024-02-07 Thread Adrien Grand
Hello all,

It's been 2 months since we released 9.9 and we accumulated a good number
of changes, so I'd like to propose that we release 9.10.0.

If there are no objections, I volunteer to be the release manager and
suggest cutting the branch next Monday (February 12th) and starting the
release process on Wednesday, one week from now (February 14th).

+Uwe Schindler  I remember that there are JDK22-related
changes that you'd like to get into 9.10, feel free to let me know if this
timeline doesn't work for you.

-- 
Adrien


Re: [VOTE] Release Lucene/Solr 8.11.3 RC1

2024-02-07 Thread Ishan Chattopadhyaya
+1 (binding)

SUCCESS! [1:18:24.494917]

On Wed, 7 Feb 2024 at 18:24, Jan Høydahl  wrote:

> +1 (binding)
>
> SUCCESS! [1:18:11.930433]
>
> Only ran smoke tester. macOS, Temurin 1.8.0_402
>
> Jan
>
> 5. feb. 2024 kl. 23:23 skrev Houston Putman :
>
> Please vote for release candidate 1 for Lucene/Solr 8.11.3
>
> The artifacts can be downloaded from:
>
> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.11.3-RC1-revbaa7c80af4278cc8951a344d8e9320386588d12d
>
> You can run the smoke tester directly with this command:
>
> python3 -u dev-tools/scripts/smokeTestRelease.py \
>
> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.11.3-RC1-revbaa7c80af4278cc8951a344d8e9320386588d12d
>
> The vote will be open for at least 72 hours i.e. until 2024-02-08 23:00
> UTC.
>
> [ ] +1  approve
> [ ] +0  no opinion
> [ ] -1  disapprove (and reason why)
>
> Here is my +1
>
>
>


Re: [VOTE] Release Lucene/Solr 8.11.3 RC1

2024-02-07 Thread Jan Høydahl
+1 (binding)

SUCCESS! [1:18:11.930433]

Only ran smoke tester. macOS, Temurin 1.8.0_402

Jan

> 5. feb. 2024 kl. 23:23 skrev Houston Putman :
> 
> Please vote for release candidate 1 for Lucene/Solr 8.11.3
> 
> The artifacts can be downloaded from:
> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.11.3-RC1-revbaa7c80af4278cc8951a344d8e9320386588d12d
> 
> You can run the smoke tester directly with this command:
> 
> python3 -u dev-tools/scripts/smokeTestRelease.py \
> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.11.3-RC1-revbaa7c80af4278cc8951a344d8e9320386588d12d
> 
> The vote will be open for at least 72 hours i.e. until 2024-02-08 23:00 UTC.
> 
> [ ] +1  approve
> [ ] +0  no opinion
> [ ] -1  disapprove (and reason why)
> 
> Here is my +1