Re: The future of the PyLucene project
On Feb 28, 2024, at 2:29 PM, Andi Vajda wrote: > Of course anyone can vote ! > Anyone interested in this project can and should vote ! > If no one does, how do we know anyone cares ? +0.5. I’m still maintaining a docker image (coady/pylucene:rc), a homebrew formula, and a dependent project (lupyne). But the state of that project is much the same - I don’t know how much interest there still is in it. I feel like Lucene should have python bindings in principle, but I don’t personally have a use case anymore. Thanks for your work on this, whatever you decide.
Re: [VOTE] Release PyLucene 8.1.1 (rc2)
+1. rc builds available: - docker pull coady/pylucene:rc - brew install —devel coady/tap/pylucene > On Jun 22, 2019, at 5:17 PM, Andi Vajda wrote: > > > The PyLucene 8.1.1 (rc2) release tracking the recent release of > Apache Lucene 8.1.1 is ready. > > A release candidate is available from: > https://dist.apache.org/repos/dist/dev/lucene/pylucene/8.1.1-rc2/ > > PyLucene 8.1.1 is built with JCC 3.6, included in these release artifacts. > > JCC 3.6 supports Python 3.3+ (in addition to Python 2.3+). > PyLucene may be built with Python 2 or Python 3. > > Please vote to release these artifacts as PyLucene 8.1.1. > Anyone interested in this release can and should vote ! > > Thanks ! > > Andi.. > > ps: the KEYS file for PyLucene release signing is at: > https://dist.apache.org/repos/dist/release/lucene/pylucene/KEYS > https://dist.apache.org/repos/dist/dev/lucene/pylucene/KEYS > > pps: here is my +1
Re: [VOTE] Release PyLucene 7.4.0 (rc1)
On Aug 28, 2018, at 11:05 AM, Andi Vajda wrote: > > > The PyLucene 7.4.0 (rc1) release tracking the recent release of > Apache Lucene 7.4.0 is ready. > > A release candidate is available from: > https://dist.apache.org/repos/dist/dev/lucene/pylucene/7.4.0-rc1/ > > PyLucene 7.4.0 is built with JCC 3.2 included in these release artifacts. > > JCC 3.2 supports Python 3.3+ (in addition to Python 2.3+). > PyLucene may be built with Python 2 or Python 3. > > Please vote to release these artifacts as PyLucene 7.4.0. > Anyone interested in this release can and should vote ! +1. Release candidate builds also available for docker and homebrew: $ docker pull coady/pylucene:rc $ brew install coady/tap/pylucene > Thanks ! > > Andi.. > > ps: the KEYS file for PyLucene release signing is at: > https://dist.apache.org/repos/dist/release/lucene/pylucene/KEYS > https://dist.apache.org/repos/dist/dev/lucene/pylucene/KEYS > > pps: here is my +1
Re: [VOTE] Release PyLucene 7.2.0 (rc1)
On Dec 21, 2017, at 3:50 AM, Andi Vajdawrote: > > > The PyLucene 7.2.0 (rc1) release tracking the upcoming release of > Apache Lucene 7.2.0 is ready. > > A release candidate is available from: > https://dist.apache.org/repos/dist/dev/lucene/pylucene/7.2.0-rc1/ > > PyLucene 7.2.0 is built with JCC 3.1 included in these release artifacts. > > JCC 3.1 supports Python 3.3+ (in addition to Python 2.3+). > PyLucene may be built with Python 2 or Python 3. > > Please vote to release these artifacts as PyLucene 7.2.0. > Anyone interested in this release can and should vote ! +1. Release candidate builds also available for docker and homebrew: $ docker pull coady/pylucene:rc $ brew install coady/core/pylucene > Thanks ! > > Andi.. > > ps: the KEYS file for PyLucene release signing is at: > https://dist.apache.org/repos/dist/release/lucene/pylucene/KEYS > https://dist.apache.org/repos/dist/dev/lucene/pylucene/KEYS > > pps: here is my +1
Re: [VOTE] Release PyLucene 6.2.0 (rc1)
+1. I’ve created a docker image and homebrew formula for rc2; will update them on release. $ docker pull coady/pylucene:6 $ brew install coady/core/pylucene > On Sep 8, 2016, at 7:07 AM, Andi Vajdawrote: > > > After an almost two year hiatus, a new PyLucene version is ready for release. > The PyLucene 6.2.0 (rc1) release tracking the recent release of Apache Lucene > 6.2.0 is ready. > > A release candidate is available from: > https://dist.apache.org/repos/dist/dev/lucene/pylucene/6.2.0-rc1/ > > PyLucene 6.2.0 is built with JCC 2.22 included in these release artifacts. > > Please vote to release these artifacts as PyLucene 6.2.0. > Anyone interested in this release can and should vote ! > > Thanks ! > > Andi.. > > ps: the KEYS file for PyLucene release signing is at: > https://dist.apache.org/repos/dist/release/lucene/pylucene/KEYS > https://dist.apache.org/repos/dist/dev/lucene/pylucene/KEYS > > pps: here is my +1 >
Re: [VOTE] Release PyLucene 4.10.1-0
On Oct 1, 2014, at 11:49 AM, Andi Vajda va...@apache.org wrote: On Tue, 30 Sep 2014, Andi Vajda wrote: On Tue, 30 Sep 2014, Aric Coady wrote: I?ve found a regression involving Python* classes. If the overridden methods raise an error, it?s causing a crash instead of propagating the error. Here?s a simple example: from org.apache.pylucene.search import PythonFilter class Filter(PythonFilter): Broken filter to test errors are raised. def getDocIdSet(self, *args): assert False I added the same 'assert False' line at line 69 in test/test_FilteredQuery.py and this test fails (as expected) but I get no crash. In other words (I should have been clearer), can you please help me reproduce this by sending in a self-contained piece of code that causes the crash. The crash reproduced in the (modified) test for me. And it is only reproducing with icc 2.21 / lucene 4.10. So that means it is related to the change, but also almost certainly another symptom of mismatched compilers (re the other recent thread about linking errors). I gave up on shared builds awhile ago because of Xcode tools updating the compiler. It reproduces with a NO_SHARED build, with both the system python and homebrew's python, which as you can see below were built with older compiler versions. $ /usr/bin/python (system) Python 2.7.5 (default, Mar 9 2014, 22:15:05) [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin $ /usr/local/bin/python (homebrew) Python 2.7.8 (default, Aug 24 2014, 21:26:19) [GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin $ gcc -v Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 6.0 (clang-600.0.51) (based on LLVM 3.5svn) Target: x86_64-apple-darwin13.4.0 Thread model: posix Or could it be that you're running a mixture of JCCs ? The area of your crash, Python error reporting, did change between JCC 2.20 and JCC 2.21. Andi.. Andi.. Run any search using an instance of that filter and it should reproduce. On Sep 29, 2014, at 7:05 PM, Andi Vajda va...@apache.org wrote: The PyLucene 4.10.1-0 release tracking today's release of Apache Lucene 4.10.1 is ready. *** ATTENTION *** Starting with release 4.8.0, Lucene now requires Java 1.7 at the minimum. Using Java 1.6 with Lucene 4.8.0 and newer is not supported. On Mac OS X, Java 6 is still a common default, please upgrade if you haven't done so already. A common upgrade is Oracle Java 1.7 for Mac OS X: http://docs.oracle.com/javase/7/docs/webnotes/install/mac/mac-jdk.html On Mac OS X, once installed, a way to make Java 1.7 the default in your bash shell is: $ export JAVA_HOME=`/usr/libexec/java_home` Be sure to verify that this JAVA_HOME value is correct. On any system, if you're upgrading your Java installation, please rebuild JCC as well. You must use the same version of Java for both JCC and PyLucene. *** /ATTENTION *** A release candidate is available from: http://people.apache.org/~vajda/staging_area/ A list of changes in this release can be seen at: http://svn.apache.org/repos/asf/lucene/pylucene/branches/pylucene_4_10/CHANGES PyLucene 4.10.1 is built with JCC 2.21 included in these release artifacts. A list of Lucene Java changes can be seen at: http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_10_1/lucene/CHANGES.txt Please vote to release these artifacts as PyLucene 4.10.1-0. Anyone interested in this release can and should vote ! Thanks ! Andi.. ps: the KEYS file for PyLucene release signing is at: http://svn.apache.org/repos/asf/lucene/pylucene/dist/KEYS http://people.apache.org/~vajda/staging_area/KEYS pps: here is my +1
Re: [VOTE] Release PyLucene 4.10.1-1
On Oct 1, 2014, at 4:13 PM, Andi Vajda va...@apache.org wrote: The PyLucene 4.10.1-1 release tracking the recent release of Apache Lucene 4.10.1 is ready. This release candidate fixes the regression found in the previous one, 4.10.1-0, and is available from: http://people.apache.org/~vajda/staging_area/ A list of changes in this release can be seen at: http://svn.apache.org/repos/asf/lucene/pylucene/branches/pylucene_4_10/CHANGES PyLucene 4.10.1 is built with JCC 2.21 included in these release artifacts. A list of Lucene Java changes can be seen at: http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_10_1/lucene/CHANGES.txt Please vote to release these artifacts as PyLucene 4.10.1-1. Anyone interested in this release can and should vote ! +1. No issues found. And I’ll update the homebrew formula when it’s released. Thanks ! Andi.. ps: the KEYS file for PyLucene release signing is at: http://svn.apache.org/repos/asf/lucene/pylucene/dist/KEYS http://people.apache.org/~vajda/staging_area/KEYS pps: here is my +1
Re: [VOTE] Release PyLucene 4.10.1-0
I’ve found a regression involving Python* classes. If the overridden methods raise an error, it’s causing a crash instead of propagating the error. Here’s a simple example: from org.apache.pylucene.search import PythonFilter class Filter(PythonFilter): Broken filter to test errors are raised. def getDocIdSet(self, *args): assert False Run any search using an instance of that filter and it should reproduce. On Sep 29, 2014, at 7:05 PM, Andi Vajda va...@apache.org wrote: The PyLucene 4.10.1-0 release tracking today's release of Apache Lucene 4.10.1 is ready. *** ATTENTION *** Starting with release 4.8.0, Lucene now requires Java 1.7 at the minimum. Using Java 1.6 with Lucene 4.8.0 and newer is not supported. On Mac OS X, Java 6 is still a common default, please upgrade if you haven't done so already. A common upgrade is Oracle Java 1.7 for Mac OS X: http://docs.oracle.com/javase/7/docs/webnotes/install/mac/mac-jdk.html On Mac OS X, once installed, a way to make Java 1.7 the default in your bash shell is: $ export JAVA_HOME=`/usr/libexec/java_home` Be sure to verify that this JAVA_HOME value is correct. On any system, if you're upgrading your Java installation, please rebuild JCC as well. You must use the same version of Java for both JCC and PyLucene. *** /ATTENTION *** A release candidate is available from: http://people.apache.org/~vajda/staging_area/ A list of changes in this release can be seen at: http://svn.apache.org/repos/asf/lucene/pylucene/branches/pylucene_4_10/CHANGES PyLucene 4.10.1 is built with JCC 2.21 included in these release artifacts. A list of Lucene Java changes can be seen at: http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_10_1/lucene/CHANGES.txt Please vote to release these artifacts as PyLucene 4.10.1-0. Anyone interested in this release can and should vote ! Thanks ! Andi.. ps: the KEYS file for PyLucene release signing is at: http://svn.apache.org/repos/asf/lucene/pylucene/dist/KEYS http://people.apache.org/~vajda/staging_area/KEYS pps: here is my +1
Re: Can\'t build on Mavericks (different issue)
On Sep 29, 2014, at 4:00 PM, Andi Vajda va...@apache.org wrote: On Sat, 27 Sep 2014, Andi Vajda wrote: On Sat, 27 Sep 2014, Mattmann, Chris A (3980) wrote: Guys was there ever a fix to this? I¹m having the exact same issue :( Some notes: Mac OS 10.9.4 Trying to build JCC 2.19 Patching with instructions here https://github.com/chrismattmann/etllib/ to deal with JAVAFRAMEWORKS hard coding, etc. Built collective.python, my own Python buildout, and am using VirtualEnv Python 2.7.8 (default, Sep 27 2014, 11:46:04) [GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin Type help, copyright, credits or license for more information. Still getting the darned linker error (I am using home brew and installing GCC right now in hopes that fixes it). This really sucks.. Ok, help me help you here. What is it you're trying to do ? Yes, I see you say you're trying to build JCC 2.19 on Mac OS 10.9.4. This requires no patching. So there's got to be something else going here ? In particular, what's this patching you're referring to ? I did follow the link, I couldn't find relevant information for someone out of context like myself. While I don't know what the linker error you're reporting below exactly means, ld: internal error: atom not found in symbolIndex(__ZN7JNIEnv_13CallIntMethodEP8_jobjectP10_jmethodIDz) for architecture x86_64 clang: error: linker command failed with exit code 1 (use -v to see invocation) it suspiciously looks like a mismatch between your C++ compiler and your C++ linker somehow as it can't find this particular decorated signature of Yes, and the common cause is using current mavericks Xcode tools and the system python, which was built with an older compiler. You can build a current python, as Andi is suggesting. Or conversely build jcc with the same compiler version (4.2?). Or third option is to disable shared and skip the whole issue. That’s what the homebrew formula does by default (brew install pylucene). CallIntMethod(), a C++ method provided by your JDK's JNI library. For example, you could be compiling against one version of the JDK and linking against another. Or you could be compiling with Clang and linking with GCC or using a python compiled with GCC and using Clang or any of the many other ways to mismatch things. I see your Python invokes clang as the C++ compiler, mine invokes gcc. Maybe there is a bug with compiling JCC with clang ? I don't think I've ever tried this myself... Maybe I should try Clang with a more recent Python... I've now tried this (on Mac OS X 10.9.5): - downloaded python 2.7.8 sources - configured it with CC=clang and MACOSX_DEPLOYMENT_TARGET=10.9 - built, installed - installed virtualenv 1.11.6 (which also brings in a recent version of setuptools) - created a virtualenv with this 2.7.8 build - built jcc 2.20 in it without any errors In other words, I can't reproduce the error you reported. Andi.. For reference, here is my env and the log of my compilation: Mac OS X: 10.9.2 Python: Python 2.7 (r27:82500, Jul 7 2010, 03:32:44) [GCC 4.2.1 (Apple Inc. build 5659)] on darwin Type help, copyright, credits or license for more information. Java: jdk1.8.0_05 GCC: gcc --version Configured with: --prefix=/Volumes/Yuzu/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn) Target: x86_64-apple-darwin13.1.0 Thread model: posix JCC 2.20 Build Log (looks very similar to JCC 2.19): yuzu:vajda ../_install/bin/python setup.py build found JAVAHOME = /Library/Java/JavaVirtualMachines/jdk1.8.0_05.jdk/Contents/Home found JAVAFRAMEWORKS = /System/Library/Frameworks/JavaVM.framework Loading source files for package org.apache.jcc... Constructing Javadoc information... Standard Doclet version 1.6.0_65 Building tree for all the packages and classes... Generating javadoc/org/apache/jcc//PythonException.html... Generating javadoc/org/apache/jcc//PythonVM.html... Generating javadoc/org/apache/jcc//package-frame.html... Generating javadoc/org/apache/jcc//package-summary.html... Generating javadoc/org/apache/jcc//package-tree.html... Generating javadoc/constant-values.html... Generating javadoc/serialized-form.html... Building index for all the packages and classes... Generating javadoc/overview-tree.html... Generating javadoc/index-all.html... Generating javadoc/deprecated-list.html... Building index for all classes... Generating javadoc/allclasses-frame.html... Generating javadoc/allclasses-noframe.html... Generating javadoc/index.html... Generating javadoc/help-doc.html... Generating javadoc/stylesheet.css... running build running build_py writing /Users/vajda/apache/pylucene/jcc/jcc/config.py creating build creating build/lib.macosx-10.6-x86_64-2.7 creating
Re: [VOTE] Release PyLucene 4.9.0-0
On Jul 7, 2014, at 8:14 AM, Andi Vajda va...@apache.org wrote: The PyLucene 4.9.0-0 release tracking the recent release of Apache Lucene 4.9.0 is ready. *** ATTENTION *** Starting with release 4.8.0, Lucene now requires Java 1.7 at the minimum. Using Java 1.6 with Lucene 4.8.0 and newer is not supported. On Mac OS X, Java 6 is still a common default, please upgrade if you haven't done so already. A common upgrade is Oracle Java 1.7 for Mac OS X: http://docs.oracle.com/javase/7/docs/webnotes/install/mac/mac-jdk.html On Mac OS X, once installed, a way to make Java 1.7 the default in your bash shell is: $ export JAVA_HOME=`/usr/libexec/java_home` Be sure to verify that this JAVA_HOME value is correct. On any system, if you're upgrading your Java installation, please rebuild JCC as well. You must use the same version of Java for both JCC and PyLucene. *** /ATTENTION *** A release candidate is available from: http://people.apache.org/~vajda/staging_area/ A list of changes in this release can be seen at: http://svn.apache.org/repos/asf/lucene/pylucene/branches/pylucene_4_9/CHANGES PyLucene 4.9.0 is built with JCC 2.20 included in these release artifacts. A list of Lucene Java changes can be seen at: http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_9_0/lucene/CHANGES.txt Please vote to release these artifacts as PyLucene 4.9.0-0. Anyone interested in this release can and should vote ! +1. No issue found. And I’ll update the homebrew formula when it’s released. Thanks ! Andi.. ps: the KEYS file for PyLucene release signing is at: http://svn.apache.org/repos/asf/lucene/pylucene/dist/KEYS http://people.apache.org/~vajda/staging_area/KEYS pps: here is my +1
Re: Getting term vectors/computing cosine similarity
On May 28, 2014, at 12:03 AM, Michael O'Leary mich...@moz.com wrote: Hi Andi, Thanks for the help. I just tried to import TVTermsEnum so I could try casting my iter, and I don't see how to do it since TVTermsEnum is a private class with fully qualified name org.apache.lucene.codecs.compressing.CompressingTermVectorsReader$TVTermsEnum. I tried Cast the TermsEnum object with BytesRefIterator.cast_. Then it will have a next method, and be python-iterable. Here’s an example that outputs the term vectors as a generator. Look at the vector method just above: https://pythonhosted.org/lupyne/_modules/lupyne/engine/indexers.html#IndexReader.termvector from org.apache.lucene.codecs.compressing import CompressingTermVectorsReader$TVTermsEnum from org.apache.lucene.codecs.compressing import TVTermsEnum and import org.apache.lucene.codecs.compressing but none of them provided access to TVTermsEnum (the first two raised exceptions). After running import org.apache.lucene.codecs.compressing, I could do dir(org.apache.lucene.codecs.compressing) and see the contents of that module. CompressingTermVectorsReader was listed, but TVTermsEnum wasn't. TVTermsEnum also wasn't listed in the output of dir(org.apache.lucene.codecs.compressing.CompressingTermVectorsReader). So it looks like my first problem is how to get access to TVTermsEnum. Mike On Tue, May 27, 2014 at 11:10 PM, Andi Vajda va...@apache.org wrote: On May 27, 2014, at 19:17, Michael O'Leary mich...@moz.com wrote: *tl;dnr*: a next() method is defined for the Java class TVTermsEnum in Lucene 4.8.1, but it looks like there is no next() method available for an object that looks like it is an instance of the Python class TVTermsEnum in PyLucene 4.8.1. If there is a next() method, there is a good chance the object is even iterable (in the python sense). You may need to cast it first, though, as the api that returned it to you may not be defined to return TVTermsEnum: TVTermsEnum.cast_(obj) A good place for PyLucene code examples is its suite of unit tests. It also has a few samples - way less than in 3.x releases because the APIs changed too much. I'm pretty sure there is a test involving TermsEnum in the tests directory. Andi.. I have a set of documents that I would like to cluster. These documents share a vocabulary of only about 3,000 unique terms, but there are about 15,000,000 documents. One way I thought of doing this would be to index the documents using PyLucene (Python is the preferred programming language at work), obtain term vectors for the documents using PyLucene API functions, and calculate cosine similarities between pairs of term vectors in order to determine which documents are close to each other. I found some sample Java code on the web that various people have posted showing ways to do this with older versions of Lucene. I downloaded PyLucene 4.8.1 and compared its API functions with the ones used in the code samples, and saw that this is an area of Lucene that has changed quite a bit. I can send an email to the lucene-user mailing group to ask what would be a good way of doing this using version 4.8.1, but the question I have for this mailing group has to do with some Java API functions that it looks like are not exposed in Python, unless I have to go about accessing them in a different way. If I obtain the term vector for the field cat_ids in a document with id doc_id_1 doc_1_tfv = reader.getTermVector(doc_id_1, cat_ids) then doc_1_tfv is displayed as this object: Terms: org.apache.lucene.codecs.compressing.CompressingTermVectorsReader$TVTerms@32c46396 In some of the sample code I looked at, the terms in doc_1_tfv could be obtained with doc_1_tfv.getTerms(), but it looks like getTerms is not a member function of Terms or its subclasses any more. In another code sample, an iterator for the term vector is obtained via tfv_iter = doc_1_tfv.iterator(None) and then the terms are obtained one by one with calls to tfv_iter.next(). This is where I get stuck. tfv_iter has this value: TermsEnum: org.apache.lucene.codecs.compressing.CompressingTermVectorsReader$TVTermsEnum@1cca2369 and there is a next() function defined for the TVTermsEnum class, but this object doesn't list next() as one of its member functions and an exception is raised if it is called. It looks like the object only supports the member functions defined for the TermsEnum class, and next() is not one of them. Is this the case, or is there a way have it support all of the TVTermsEnum member functions, including next()? TVTermsEnum is a private class in CompressingTermVectorsReader.java. So I am wondering if there is a way to obtain term vectors in this way and that I am just not treating doc_1_tfv and tfv_iter in the right way, or if there is a different, better way to get term vectors for documents in a PyLucene index, or if this isn't something that
Re: release votes
On Apr 24, 2014, at 11:40 AM, Andi Vajda va...@apache.org wrote: On Thu, 24 Apr 2014, Thomas Koch wrote: I don't agree that it is unimportant to make PyLucene releases. Without a ready-to-run software package the hurdles to use PyLucene are raised. It is already not quite simple (for beginners) to install PyLucene on the various platforms. Having a packaged release that is tested by some users provides a benefit to the community in my opinion. I agree with you that making releases is important. However, when votes are called to actually make them, it's been hard to get voters to respond. Anyone can vote. Anyone with an interest should vote. Three PMC votes are required to make a release happen, though. But any vote for or against is important, PMC or not. Lately, it's been hard to get the TWO extra PMC votes needed to make a release happen (since mine is cast when I cut the release candidate). I think this is in part _because_ no one else is showing an interest in the release and casting a vote either. Oh, well I for one had no idea votes from the community at large were encouraged. In that case… +1. I tested 4.7.2 against my downstream project. No issues. However I can understand your arguments - there has been little feedback on your release announcements on the list recently. On the other hand there are frequent discussions about PyLucene on the list so I don't think the interest has declined. Did you check the number of downloads of the PyLucene distributions (if this is possible at all - due to the distributed releases on the apache mirrors ...)? This would be a more accurate indicator from my point of view. I have no idea about the number of downloads of PyLucene. JCC, however, has gotten over 2700 downloads in the past month: https://pypi.python.org/pypi/JCC/2.19 I must also admit that I did never understand the voting process in detail - i.e. who are the PMC members and what impact have votes of non PMC users. Maybe some more transparency and another call for action would help to raise awareness in the community. There are at least three classes in the Apache meritocracy: - users, developers, contributors but not committers - committers, ie developers who can commit patches to the project - PMC members, ie project committers that sit on the PMC (project management committee) For more information, please see: https://www.apache.org/foundation/how-it-works.html By the rules guiding the release of Apache projects, three PMC votes are necessary to release a tarball to the world. The list of Lucene committers is visible here: http://lucene.apache.org/whoweare.html Scroll down that list for the PMC membership. Andi..
installation succeeds but errors anyway
Any idea what’s happening here at the end of make install? Even though lucene was actually installed, make exits with an error intermittently. As if it’s trying to download it from pypi afterwards? … Adding lucene 4.7.2 to easy-install.pth file Installed .../lib/python2.7/site-packages/lucene-4.7.2-py2.7-macosx-10.9-x86_64.egg Processing dependencies for lucene==4.7.2 Searching for lucene==4.7.2 Reading http://pypi.python.org/simple/lucene/ Couldn't find index page for 'lucene' (maybe misspelled?) Scanning index of all packages (this may take a while) Reading http://pypi.python.org/simple/ No local packages or download links found for lucene==4.7.2 error: Could not find suitable distribution for Requirement.parse('lucene==4.7.2') make: *** [install] Error 1 Error: pylucene 4.7.2-1 did not build
Re: setAllowLeadingWildcard and PythonMultiFieldQueryParser
On Aug 18, 2010, at 10:13 PM, Andi Vajda wrote: On Wed, 18 Aug 2010, Aric Coady wrote: #query = queryParser.parse(queryString) query = queryParser.parse(Version.LUCENE_CURRENT, queryString, fields, [BooleanClause.Occur.SHOULD, BooleanClause.Occur.SHOULD], analyzer) Whenever there is a name conflict between a static and non-static method detected by JCC, the static method wrapper is renamed to be suffixed with a '_' and a warning is emitted by JCC. Does changing the code to use a parse_() method instead solve the problem ? (it's late here and I haven't tried it myself) Ah, so there are couple different things going on here. MultiFieldQueryParser has only static parse methods, except that it also inherits QueryParse.parse. Perhaps that's why JCC isn't supplying a parse_ method. lucene.MultiFieldQueryParser.parse built-in method parse of type object at 0x10171d800 lucene.MultiFieldQueryParser.parse_ Traceback (most recent call last): File stdin, line 1, in module AttributeError: type object 'MultiFieldQueryParser' has no attribute 'parse_' lucene.QueryParser.parse method 'parse' of 'QueryParser' objects This gotcha has come up before: http://mail-archives.apache.org/mod_mbox/lucene-pylucene-dev/201007.mbox/%3caanlktinkhxsiqp7jljz1q0cy6cv03y5umyzvg8a5d...@mail.gmail.com%3e. But as known limitations go, it's an easy workaround. Just call QueryParser.parse with the parser object as the first argument. As for the wildcard issue, I was trying to point out that I don't think it's a pylucene problem at all. The example given was calling the static MultiFieldQueryParser.parse with a parser object, incorrectly expecting settings on the parser object to have an affect. The fact that calling queryParser.parse(queryString) raises a TypeError is technically unrelated, although probably adding to the confusion.
Re: API changes between 2.9.2 and 2.9.3
On Jul 21, 2010, at 12:18 AM, Thomas Koch wrote: The question remains if it's feasible to support 2.x *and* 3.x - as Bill mentioned ... I'd like to make it work on both. - me too. I did fear that this makes things much more complicated and you end up with code if lucene.VERSION.split('.')[0]2: ... else ... - we did that some time ago during GCJ and JCC based versions of PyLucene, but at that time it was merely a matter of different imports and init stuff (initVM). But I understand now that as long as you remove deprecated code from 2.9 it *should* work with 2.9 and 3.0 as well! Right? It's certainly possible, but there are some gotchas. I've been maintaining 2.4, 2.9, and 3.0 for my project (http://code.google.com/p/lupyne/), and just recently dropped 2.4 support. The conditional checks that are still left involve the Python* overrides. There are several in 2.9 that still wrap the deprecated method or class, and of course they're missing in 3.0. The ones I remember are PythonHitCollector, PythonFilter.bits, and PythonTokenFilter iteration.
Re: Problem getting tokens for document
Hey, Herb. There is a memory leak in the string array in pylucene 2.4. In this case it would be the iteration of tfvP.getTerms(). The fix made it into 2.9, more history here: http://mail-archives.apache.org/mod_mbox/lucene-pylucene-dev/200907.mbox/%3calpine.osx.2.01.0907301553230.5...@yuzu%3e On Apr 14, 2010, at 10:21 AM, Herbert Roitblat wrote: Hi, folks. I am using PyLucene and doing a lot of get tokens. lucene.py reports version 2.4.0. It is rpath linux with 8GB of memory. Python is 2.4. The system indexes 116,000 documents just fine. Maxheap is '2048m', 64 bit environment. Then I need to get the tokens from these documents and near the end, I run into: java.lang.OutOfMemoryError: GC overhead limit exceeded The heap is apparently filling up with each document retrieved and never getting cleared. I was expecting that it would give me the information for one document, then clear that and give me the info for another, etc. I've looked at it with jhat. I have tried deleting the Python objects that receive any information from Lucene--no effect. I have tried reusing the Python objects that receive any information from Lucene--no effect. I have tried running the Python garbage collector (it slowed the program slightly, but generally no effect). Is there anything else I can do to get the tokens for a document and make sure that this does not fill up the heap? I need to be able to run a million or more documents through this and get their tokens. Here is a code snippet. reader = self.index.getReader() lReader = reader.get() searcher = self.index.getSearcher() lSearcher = searcher.get() query = lucene.TermQuery(lucene.Term(OTDocument.UID_FIELD_ID, uid)) hits = list(lSearcher.search(query)) if hits: hit = lucene.Hit.cast_(hits[0]) tfvs = lReader.getTermFreqVectors(hit.id) if tfvs is not None: # this happens if the vector is not stored for tfv in tfvs: # There's one for each field that has a TermFreqVector tfvP = lucene.TermFreqVector.cast_(tfv) if returnAllFields or tfvP.field in termFields: # add only asked fields tFields[tfvP.field] = dict([(t,f) for (t,f) in zip(tfvP.getTerms(),tfvP.getTermFrequencies()) if f =minFreq]) else: # This shouldn't happen, but we just log the error and march on self.log.error(Unable to fetch doc %s from index%(uid)) lReader.close() lSearcher.close() lReader is really: lucene.IndexReader.open(self._store) I've tried the Lucene list, but no one there has yet come up with a solution. If filling the heap is a Lucene problem (is it a bug), I need to look for a way to circumvent that bug. Thanks, Herb
Re: PyLucene 2.9.0 sources available for testing
On Sep 29, 2009, at 3:10 PM, Andi Vajda wrote: With the recent release of Java Lucene 2.9.0, a PyLucene 2.9.0 release is in the works. I just completed the first rev of this and checked it into svn trunk. So far, I've only tested it on Mac OS X 10.6 with 64-bit Python. All unit tests pass as run with 'make test'. If you're on a different platform and have some spare cycles, I'd be curious to see if all unit tests pass on your platform. Unit tests passed for me (also on OS X though). I did run into a problem with the MemoryIndex type missing. It was a top level class in 2.4, and apparently the highlighter uses it. import lucene lucene.initVM(lucene.CLASSPATH) print lucene.VERSION print lucene.MemoryIndex # AttributeError in 2.9
Re: initVM() crash and web.py
On Jun 15, 2009, at 12:12 PM, Neha Gupta wrote: The problem am having is that when I send a few requests one after the other then the server crashes. I tried to put initVM() right after import lucene statement at the top of the program but the crash still happens. I also read this post: http://lists.osafoundation.org/pipermail/pylucene-dev/2008-April/002634.html .http://lists.osafoundation.org/pipermail/pylucene-dev/2008-April/002634.html The problem seems familiar to mine, however, am not sure where should I put initVM(). Should I also be calling env.attachCurrentThread() somewhere? initVM must be called exactly once, so at the top of the program is a fine place for it. attachCurrentThread must be called at least once per thread. It is idempotent (and fast), so you can put it at the top of your request handler. I prefer putting it in the threading code anyway, instead of cluttering the app. My lupyne project has an example of that: look for WorkerThread in http://code.google.com/p/lupyne/source/browse/trunk/lupyne/server.py . LuPyne uses cherrypy, and web.py uses cherrypy's wsgiserver, so the code will translate.