Re: The future of the PyLucene project

2024-02-28 Thread Aric Coady
On Feb 28, 2024, at 2:29 PM, Andi Vajda  wrote:
> Of course anyone can vote !
> Anyone interested in this project can and should vote !
> If no one does, how do we know anyone cares ?

+0.5. I’m still maintaining a docker image (coady/pylucene:rc), a homebrew 
formula, and a dependent project (lupyne). But the state of that project is 
much the same - I don’t know how much interest there still is in it.

I feel like Lucene should have python bindings in principle, but I don’t 
personally have a use case anymore. Thanks for your work on this, whatever you 
decide.



Re: [VOTE] Release PyLucene 8.1.1 (rc2)

2019-06-23 Thread Aric Coady
+1.  rc builds available:

- docker pull coady/pylucene:rc
- brew install —devel coady/tap/pylucene

> On Jun 22, 2019, at 5:17 PM, Andi Vajda  wrote:
> 
> 
> The PyLucene 8.1.1 (rc2) release tracking the recent release of
> Apache Lucene 8.1.1 is ready.
> 
> A release candidate is available from:
>  https://dist.apache.org/repos/dist/dev/lucene/pylucene/8.1.1-rc2/
> 
> PyLucene 8.1.1 is built with JCC 3.6, included in these release artifacts.
> 
> JCC 3.6 supports Python 3.3+ (in addition to Python 2.3+).
> PyLucene may be built with Python 2 or Python 3.
> 
> Please vote to release these artifacts as PyLucene 8.1.1.
> Anyone interested in this release can and should vote !
> 
> Thanks !
> 
> Andi..
> 
> ps: the KEYS file for PyLucene release signing is at:
> https://dist.apache.org/repos/dist/release/lucene/pylucene/KEYS
> https://dist.apache.org/repos/dist/dev/lucene/pylucene/KEYS
> 
> pps: here is my +1



Re: [VOTE] Release PyLucene 7.4.0 (rc1)

2018-08-29 Thread Aric Coady
On Aug 28, 2018, at 11:05 AM, Andi Vajda  wrote:
> 
> 
> The PyLucene 7.4.0 (rc1) release tracking the recent release of
> Apache Lucene 7.4.0 is ready.
> 
> A release candidate is available from:
>  https://dist.apache.org/repos/dist/dev/lucene/pylucene/7.4.0-rc1/
> 
> PyLucene 7.4.0 is built with JCC 3.2 included in these release artifacts.
> 
> JCC 3.2 supports Python 3.3+ (in addition to Python 2.3+).
> PyLucene may be built with Python 2 or Python 3.
> 
> Please vote to release these artifacts as PyLucene 7.4.0.
> Anyone interested in this release can and should vote !

+1.  Release candidate builds also available for docker and homebrew:
$ docker pull coady/pylucene:rc
$ brew install coady/tap/pylucene

> Thanks !
> 
> Andi..
> 
> ps: the KEYS file for PyLucene release signing is at:
> https://dist.apache.org/repos/dist/release/lucene/pylucene/KEYS
> https://dist.apache.org/repos/dist/dev/lucene/pylucene/KEYS
> 
> pps: here is my +1



Re: [VOTE] Release PyLucene 7.2.0 (rc1)

2017-12-29 Thread Aric Coady
On Dec 21, 2017, at 3:50 AM, Andi Vajda  wrote:
> 
> 
> The PyLucene 7.2.0 (rc1) release tracking the upcoming release of
> Apache Lucene 7.2.0 is ready.
> 
> A release candidate is available from:
>  https://dist.apache.org/repos/dist/dev/lucene/pylucene/7.2.0-rc1/
> 
> PyLucene 7.2.0 is built with JCC 3.1 included in these release artifacts.
> 
> JCC 3.1 supports Python 3.3+ (in addition to Python 2.3+).
> PyLucene may be built with Python 2 or Python 3.
> 
> Please vote to release these artifacts as PyLucene 7.2.0.
> Anyone interested in this release can and should vote !

+1.  Release candidate builds also available for docker and homebrew:
$ docker pull coady/pylucene:rc
$ brew install coady/core/pylucene

> Thanks !
> 
> Andi..
> 
> ps: the KEYS file for PyLucene release signing is at:
> https://dist.apache.org/repos/dist/release/lucene/pylucene/KEYS
> https://dist.apache.org/repos/dist/dev/lucene/pylucene/KEYS
> 
> pps: here is my +1



Re: [VOTE] Release PyLucene 6.2.0 (rc1)

2016-09-11 Thread Aric Coady
+1.  I’ve created a docker image and homebrew formula for rc2;  will update 
them on release.

$ docker pull coady/pylucene:6
$ brew install coady/core/pylucene

> On Sep 8, 2016, at 7:07 AM, Andi Vajda  wrote:
> 
> 
> After an almost two year hiatus, a new PyLucene version is ready for release. 
> The PyLucene 6.2.0 (rc1) release tracking the recent release of Apache Lucene 
> 6.2.0 is ready.
> 
> A release candidate is available from:
>  https://dist.apache.org/repos/dist/dev/lucene/pylucene/6.2.0-rc1/
> 
> PyLucene 6.2.0 is built with JCC 2.22 included in these release artifacts.
> 
> Please vote to release these artifacts as PyLucene 6.2.0.
> Anyone interested in this release can and should vote !
> 
> Thanks !
> 
> Andi..
> 
> ps: the KEYS file for PyLucene release signing is at:
> https://dist.apache.org/repos/dist/release/lucene/pylucene/KEYS
> https://dist.apache.org/repos/dist/dev/lucene/pylucene/KEYS
> 
> pps: here is my +1
> 



Re: [VOTE] Release PyLucene 4.10.1-0

2014-10-01 Thread Aric Coady
On Oct 1, 2014, at 11:49 AM, Andi Vajda va...@apache.org wrote:
 On Tue, 30 Sep 2014, Andi Vajda wrote:
 
 
 On Tue, 30 Sep 2014, Aric Coady wrote:
 
 I?ve found a regression involving Python* classes.  If the overridden 
 methods raise an error, it?s causing a crash instead of propagating the 
 error.  Here?s a simple example:
 from org.apache.pylucene.search import PythonFilter
 class Filter(PythonFilter):
   Broken filter to test errors are raised.
   def getDocIdSet(self, *args):
   assert False
 
 I added the same 'assert False' line at line 69 in 
 test/test_FilteredQuery.py and this test fails (as expected) but I get no 
 crash.
 
 In other words (I should have been clearer), can you please help me reproduce 
 this by sending in a self-contained piece of code that causes the crash.

The crash reproduced in the (modified) test for me.  And it is only reproducing 
with icc 2.21 / lucene 4.10.  So that means it is related to the change, but 
also almost certainly another symptom of mismatched compilers (re the other 
recent thread about linking errors).  I gave up on shared builds awhile ago 
because of Xcode tools updating the compiler.  It reproduces with a NO_SHARED 
build, with both the system python and homebrew's python, which as you can see 
below were built with older compiler versions.

$ /usr/bin/python (system)
Python 2.7.5 (default, Mar  9 2014, 22:15:05) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin

$ /usr/local/bin/python (homebrew)
Python 2.7.8 (default, Aug 24 2014, 21:26:19) 
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin

$ gcc -v
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr 
--with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 6.0 (clang-600.0.51) (based on LLVM 3.5svn)
Target: x86_64-apple-darwin13.4.0
Thread model: posix

 Or could it be that you're running a mixture of JCCs ?
 The area of your crash, Python error reporting, did change between JCC 2.20 
 and JCC 2.21.
 
 Andi..
 
 
 Andi..
 
 Run any search using an instance of that filter and it should reproduce.
 On Sep 29, 2014, at 7:05 PM, Andi Vajda va...@apache.org wrote:
 The PyLucene 4.10.1-0 release tracking today's release of Apache Lucene 
 4.10.1 is ready.
 *** ATTENTION ***
 Starting with release 4.8.0, Lucene now requires Java 1.7 at the minimum.
 Using Java 1.6 with Lucene 4.8.0 and newer is not supported.
 On Mac OS X, Java 6 is still a common default, please upgrade if you 
 haven't done so already. A common upgrade is Oracle Java 1.7 for Mac OS X:
 http://docs.oracle.com/javase/7/docs/webnotes/install/mac/mac-jdk.html
 On Mac OS X, once installed, a way to make Java 1.7 the default in your 
 bash shell is:
 $ export JAVA_HOME=`/usr/libexec/java_home`
 Be sure to verify that this JAVA_HOME value is correct.
 On any system, if you're upgrading your Java installation, please rebuild
 JCC as well. You must use the same version of Java for both JCC and 
 PyLucene.
 *** /ATTENTION ***
 A release candidate is available from:
 http://people.apache.org/~vajda/staging_area/
 A list of changes in this release can be seen at:
 http://svn.apache.org/repos/asf/lucene/pylucene/branches/pylucene_4_10/CHANGES
 PyLucene 4.10.1 is built with JCC 2.21 included in these release artifacts.
 A list of Lucene Java changes can be seen at:
 http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_10_1/lucene/CHANGES.txt
 Please vote to release these artifacts as PyLucene 4.10.1-0.
 Anyone interested in this release can and should vote !
 Thanks !
 Andi..
 ps: the KEYS file for PyLucene release signing is at:
 http://svn.apache.org/repos/asf/lucene/pylucene/dist/KEYS
 http://people.apache.org/~vajda/staging_area/KEYS
 pps: here is my +1
 



Re: [VOTE] Release PyLucene 4.10.1-1

2014-10-01 Thread Aric Coady
On Oct 1, 2014, at 4:13 PM, Andi Vajda va...@apache.org wrote:
 The PyLucene 4.10.1-1 release tracking the recent release of Apache Lucene 
 4.10.1 is ready.
 
 This release candidate fixes the regression found in the previous one, 
 4.10.1-0, and is available from:
 http://people.apache.org/~vajda/staging_area/
 
 A list of changes in this release can be seen at:
 http://svn.apache.org/repos/asf/lucene/pylucene/branches/pylucene_4_10/CHANGES
 
 PyLucene 4.10.1 is built with JCC 2.21 included in these release artifacts.
 
 A list of Lucene Java changes can be seen at:
 http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_10_1/lucene/CHANGES.txt
 
 Please vote to release these artifacts as PyLucene 4.10.1-1.
 Anyone interested in this release can and should vote !

+1.  No issues found.  And I’ll update the homebrew formula when it’s released.

 Thanks !
 
 Andi..
 
 ps: the KEYS file for PyLucene release signing is at:
 http://svn.apache.org/repos/asf/lucene/pylucene/dist/KEYS
 http://people.apache.org/~vajda/staging_area/KEYS
 
 pps: here is my +1



Re: [VOTE] Release PyLucene 4.10.1-0

2014-09-30 Thread Aric Coady
I’ve found a regression involving Python* classes.  If the overridden methods 
raise an error, it’s causing a crash instead of propagating the error.  Here’s 
a simple example:

from org.apache.pylucene.search import PythonFilter
class Filter(PythonFilter):
Broken filter to test errors are raised.
def getDocIdSet(self, *args):
assert False

Run any search using an instance of that filter and it should reproduce.

On Sep 29, 2014, at 7:05 PM, Andi Vajda va...@apache.org wrote:

 
 The PyLucene 4.10.1-0 release tracking today's release of Apache Lucene 
 4.10.1 is ready.
 
 *** ATTENTION ***
 
 Starting with release 4.8.0, Lucene now requires Java 1.7 at the minimum.
 Using Java 1.6 with Lucene 4.8.0 and newer is not supported.
 
 On Mac OS X, Java 6 is still a common default, please upgrade if you haven't 
 done so already. A common upgrade is Oracle Java 1.7 for Mac OS X:
  http://docs.oracle.com/javase/7/docs/webnotes/install/mac/mac-jdk.html
 
 On Mac OS X, once installed, a way to make Java 1.7 the default in your bash 
 shell is:
  $ export JAVA_HOME=`/usr/libexec/java_home`
 Be sure to verify that this JAVA_HOME value is correct.
 
 On any system, if you're upgrading your Java installation, please rebuild
 JCC as well. You must use the same version of Java for both JCC and PyLucene.
 
 *** /ATTENTION ***
 
 
 A release candidate is available from:
 http://people.apache.org/~vajda/staging_area/
 
 A list of changes in this release can be seen at:
 http://svn.apache.org/repos/asf/lucene/pylucene/branches/pylucene_4_10/CHANGES
 
 PyLucene 4.10.1 is built with JCC 2.21 included in these release artifacts.
 
 A list of Lucene Java changes can be seen at:
 http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_10_1/lucene/CHANGES.txt
 
 Please vote to release these artifacts as PyLucene 4.10.1-0.
 Anyone interested in this release can and should vote !
 
 Thanks !
 
 Andi..
 
 ps: the KEYS file for PyLucene release signing is at:
 http://svn.apache.org/repos/asf/lucene/pylucene/dist/KEYS
 http://people.apache.org/~vajda/staging_area/KEYS
 
 pps: here is my +1



Re: Can\'t build on Mavericks (different issue)

2014-09-29 Thread Aric Coady

On Sep 29, 2014, at 4:00 PM, Andi Vajda va...@apache.org wrote:

 
 On Sat, 27 Sep 2014, Andi Vajda wrote:
 
 
 On Sat, 27 Sep 2014, Mattmann, Chris A (3980) wrote:
 
 Guys was there ever a fix to this? I¹m having the exact same issue :(
 Some notes:
 Mac OS 10.9.4
 Trying to build JCC 2.19
 Patching with instructions here https://github.com/chrismattmann/etllib/
 to deal with JAVAFRAMEWORKS hard coding, etc.
 Built collective.python, my own Python buildout, and am using VirtualEnv
 Python 2.7.8 (default, Sep 27 2014, 11:46:04)
 [GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
 Type help, copyright, credits or license for more information.
 Still getting the darned linker error (I am using home brew and installing
 GCC right now in hopes that fixes it).
 This really sucks..
 
 Ok, help me help you here.
 What is it you're trying to do ?
 Yes, I see you say you're trying to build JCC 2.19 on Mac OS 10.9.4.
 This requires no patching. So there's got to be something else going here ?
 
 In particular, what's this patching you're referring to ?
 I did follow the link, I couldn't find relevant information for someone out 
 of context like myself.
 
 While I don't know what the linker error you're reporting below exactly 
 means,
   ld: internal error: atom not found in
   symbolIndex(__ZN7JNIEnv_13CallIntMethodEP8_jobjectP10_jmethodIDz) for
   architecture x86_64
   clang: error: linker command failed with exit code 1 (use -v to see
   invocation)
 it suspiciously looks like a mismatch between your C++ compiler and your C++
 linker somehow as it can't find this particular decorated signature of

Yes, and the common cause is using current mavericks Xcode tools and the system 
python, which was built with an older compiler.

You can build a current python, as Andi is suggesting.  Or conversely build jcc 
with the same compiler version (4.2?).  Or third option is to disable shared 
and skip the whole issue.  That’s what the homebrew formula does by default 
(brew install pylucene).

 CallIntMethod(), a C++ method provided by your JDK's JNI library.
 For example, you could be compiling against one version of the JDK and 
 linking
 against another. Or you could be compiling with Clang and linking with GCC or
 using a python compiled with GCC and using Clang or any of the many other
 ways to mismatch things. I see your Python invokes clang as the C++ compiler,
 mine invokes gcc. Maybe there is a bug with compiling JCC with clang ?
 
 I don't think I've ever tried this myself...
 Maybe I should try Clang with a more recent Python...
 
 I've now tried this (on Mac OS X 10.9.5):
  - downloaded python 2.7.8 sources
  - configured it with CC=clang and MACOSX_DEPLOYMENT_TARGET=10.9
  - built, installed
  - installed virtualenv 1.11.6 (which also brings in a recent version of
 setuptools)
  - created a virtualenv with this 2.7.8 build
  - built jcc 2.20 in it without any errors
 
 In other words, I can't reproduce the error you reported.
 
 Andi..
 
 
 For reference, here is my env and the log of my compilation:
 
 Mac OS X:
 10.9.2
 
 Python:
 Python 2.7 (r27:82500, Jul  7 2010, 03:32:44)
 [GCC 4.2.1 (Apple Inc. build 5659)] on darwin
 Type help, copyright, credits or license for more information.
 
 Java:
 jdk1.8.0_05
 
 GCC:
 gcc --version
 Configured with: 
 --prefix=/Volumes/Yuzu/Applications/Xcode.app/Contents/Developer/usr 
 --with-gxx-include-dir=/usr/include/c++/4.2.1
 Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
 Target: x86_64-apple-darwin13.1.0
 Thread model: posix
 
 JCC 2.20 Build Log (looks very similar to JCC 2.19):
 
 yuzu:vajda ../_install/bin/python setup.py build
 found JAVAHOME = 
 /Library/Java/JavaVirtualMachines/jdk1.8.0_05.jdk/Contents/Home
 found JAVAFRAMEWORKS = /System/Library/Frameworks/JavaVM.framework
 Loading source files for package org.apache.jcc...
 Constructing Javadoc information...
 Standard Doclet version 1.6.0_65
 Building tree for all the packages and classes...
 Generating javadoc/org/apache/jcc//PythonException.html...
 Generating javadoc/org/apache/jcc//PythonVM.html...
 Generating javadoc/org/apache/jcc//package-frame.html...
 Generating javadoc/org/apache/jcc//package-summary.html...
 Generating javadoc/org/apache/jcc//package-tree.html...
 Generating javadoc/constant-values.html...
 Generating javadoc/serialized-form.html...
 Building index for all the packages and classes...
 Generating javadoc/overview-tree.html...
 Generating javadoc/index-all.html...
 Generating javadoc/deprecated-list.html...
 Building index for all classes...
 Generating javadoc/allclasses-frame.html...
 Generating javadoc/allclasses-noframe.html...
 Generating javadoc/index.html...
 Generating javadoc/help-doc.html...
 Generating javadoc/stylesheet.css...
 running build
 running build_py
 writing /Users/vajda/apache/pylucene/jcc/jcc/config.py
 creating build
 creating build/lib.macosx-10.6-x86_64-2.7
 creating 

Re: [VOTE] Release PyLucene 4.9.0-0

2014-07-09 Thread Aric Coady

On Jul 7, 2014, at 8:14 AM, Andi Vajda va...@apache.org wrote:

 
 The PyLucene 4.9.0-0 release tracking the recent release of Apache Lucene 
 4.9.0 is ready.
 
 
 *** ATTENTION ***
 
 Starting with release 4.8.0, Lucene now requires Java 1.7 at the minimum.
 Using Java 1.6 with Lucene 4.8.0 and newer is not supported.
 
 On Mac OS X, Java 6 is still a common default, please upgrade if you haven't 
 done so already. A common upgrade is Oracle Java 1.7 for Mac OS X:
  http://docs.oracle.com/javase/7/docs/webnotes/install/mac/mac-jdk.html
 
 On Mac OS X, once installed, a way to make Java 1.7 the default in your bash 
 shell is:
  $ export JAVA_HOME=`/usr/libexec/java_home`
 Be sure to verify that this JAVA_HOME value is correct.
 
 On any system, if you're upgrading your Java installation, please rebuild
 JCC as well. You must use the same version of Java for both JCC and PyLucene.
 
 *** /ATTENTION ***
 
 
 A release candidate is available from:
 http://people.apache.org/~vajda/staging_area/
 
 A list of changes in this release can be seen at:
 http://svn.apache.org/repos/asf/lucene/pylucene/branches/pylucene_4_9/CHANGES
 
 PyLucene 4.9.0 is built with JCC 2.20 included in these release artifacts.
 
 A list of Lucene Java changes can be seen at:
 http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_9_0/lucene/CHANGES.txt
 
 Please vote to release these artifacts as PyLucene 4.9.0-0.
 Anyone interested in this release can and should vote !

+1.  No issue found.  And I’ll update the homebrew formula when it’s released.

 Thanks !
 
 Andi..
 
 ps: the KEYS file for PyLucene release signing is at:
 http://svn.apache.org/repos/asf/lucene/pylucene/dist/KEYS
 http://people.apache.org/~vajda/staging_area/KEYS
 
 pps: here is my +1



Re: Getting term vectors/computing cosine similarity

2014-05-28 Thread Aric Coady
On May 28, 2014, at 12:03 AM, Michael O'Leary mich...@moz.com wrote:
 Hi Andi,
 Thanks for the help. I just tried to import TVTermsEnum so I could try
 casting my iter, and I don't see how to do it since TVTermsEnum is a
 private class with fully qualified
 name 
 org.apache.lucene.codecs.compressing.CompressingTermVectorsReader$TVTermsEnum.
 I tried

Cast the TermsEnum object with BytesRefIterator.cast_.  Then it will have a 
next method, and be python-iterable.

Here’s an example that outputs the term vectors as a generator.  Look at the 
vector method just above:
https://pythonhosted.org/lupyne/_modules/lupyne/engine/indexers.html#IndexReader.termvector

 from org.apache.lucene.codecs.compressing import
 CompressingTermVectorsReader$TVTermsEnum
 from org.apache.lucene.codecs.compressing import TVTermsEnum
 and
 import org.apache.lucene.codecs.compressing
 
 but none of them provided access to TVTermsEnum (the first two raised
 exceptions). After running import org.apache.lucene.codecs.compressing, I
 could do dir(org.apache.lucene.codecs.compressing) and see the contents of
 that module. CompressingTermVectorsReader was listed, but TVTermsEnum
 wasn't. TVTermsEnum also wasn't listed in the output of
 dir(org.apache.lucene.codecs.compressing.CompressingTermVectorsReader). So
 it looks like my first problem is how to get access to TVTermsEnum.
 Mike
 
 
 On Tue, May 27, 2014 at 11:10 PM, Andi Vajda va...@apache.org wrote:
 
 
 On May 27, 2014, at 19:17, Michael O'Leary mich...@moz.com wrote:
 
 *tl;dnr*: a next() method is defined for the Java class TVTermsEnum in
 Lucene 4.8.1, but it looks like there is no next() method available for
 an
 object that looks like it is an instance of the Python class TVTermsEnum
 in
 PyLucene 4.8.1.
 
 If there is a next() method, there is a good chance the object is even
 iterable (in the python sense). You may need to cast it first, though, as
 the api that returned it to you may not be defined to return TVTermsEnum:
  TVTermsEnum.cast_(obj)
 
 A good place for PyLucene code examples is its suite of unit tests. It
 also has a few samples - way less than in 3.x releases because the APIs
 changed too much.
 I'm pretty sure there is a test involving TermsEnum in the tests directory.
 
 Andi..
 
 I have a set of documents that I would like to cluster. These documents
 share a vocabulary of only about 3,000 unique terms, but there are about
 15,000,000 documents. One way I thought of doing this would be to index
 the
 documents using PyLucene (Python is the preferred programming language at
 work), obtain term vectors for the documents using PyLucene API
 functions,
 and calculate cosine similarities between pairs of term vectors in order
 to
 determine which documents are close to each other.
 
 I found some sample Java code on the web that various people have posted
 showing ways to do this with older versions of Lucene. I downloaded
 PyLucene 4.8.1 and compared its API functions with the ones used in the
 code samples, and saw that this is an area of Lucene that has changed
 quite
 a bit. I can send an email to the lucene-user mailing group to ask what
 would be a good way of doing this using version 4.8.1, but the question I
 have for this mailing group has to do with some Java API functions that
 it
 looks like are not exposed in Python, unless I have to go about accessing
 them in a different way.
 
 If I obtain the term vector for the field cat_ids in a document with id
 doc_id_1
 
 doc_1_tfv = reader.getTermVector(doc_id_1, cat_ids)
 
 then doc_1_tfv is displayed as this object:
 
 Terms:
 
 org.apache.lucene.codecs.compressing.CompressingTermVectorsReader$TVTerms@32c46396
 
 In some of the sample code I looked at, the terms in doc_1_tfv could be
 obtained with doc_1_tfv.getTerms(), but it looks like getTerms is not a
 member function of Terms or its subclasses any more. In another code
 sample, an iterator for the term vector is obtained via tfv_iter =
 doc_1_tfv.iterator(None) and then the terms are obtained one by one with
 calls to tfv_iter.next(). This is where I get stuck. tfv_iter has this
 value:
 
 TermsEnum:
 
 org.apache.lucene.codecs.compressing.CompressingTermVectorsReader$TVTermsEnum@1cca2369
 
 and there is a next() function defined for the TVTermsEnum class, but
 this
 object doesn't list next() as one of its member functions and an
 exception
 is raised if it is called. It looks like the object only supports the
 member functions defined for the TermsEnum class, and next() is not one
 of
 them. Is this the case, or is there a way have it support all of the
 TVTermsEnum member functions, including next()? TVTermsEnum is a private
 class in CompressingTermVectorsReader.java.
 
 So I am wondering if there is a way to obtain term vectors in this way
 and
 that I am just not treating doc_1_tfv and tfv_iter in the right way, or
 if
 there is a different, better way to get term vectors for documents in a
 PyLucene index, or if this isn't something that 

Re: release votes

2014-04-24 Thread Aric Coady
On Apr 24, 2014, at 11:40 AM, Andi Vajda va...@apache.org wrote:
 On Thu, 24 Apr 2014, Thomas Koch wrote:
 I don't agree that it is unimportant to make PyLucene releases. Without a
 ready-to-run software package the hurdles to use PyLucene are raised. It is
 already not quite simple (for beginners) to install PyLucene on the various
 platforms. Having a packaged release that is tested by some users provides a
 benefit to the community in my opinion.
 
 I agree with you that making releases is important. However, when votes are 
 called to actually make them, it's been hard to get voters to respond.
 
 Anyone can vote. Anyone with an interest should vote. Three PMC votes are 
 required to make a release happen, though. But any vote for or against is 
 important, PMC or not. Lately, it's been hard to get the TWO extra PMC votes 
 needed to make a release happen (since mine is cast when I cut the release 
 candidate). I think this is in part _because_ no one else is showing an 
 interest in the release and casting a vote either.

Oh, well I for one had no idea votes from the community at large were 
encouraged.  In that case…

+1.  I tested 4.7.2 against my downstream project.  No issues.

 However I can understand your arguments - there has been little feedback on
 your release announcements on the list recently. On the other hand there are
 frequent discussions about PyLucene on the list so I don't think the
 interest has declined. Did you check the number of downloads of the PyLucene
 distributions (if this is possible at all - due to the distributed releases
 on the apache mirrors ...)? This would be a more accurate indicator from my
 point of view.
 
 I have no idea about the number of downloads of PyLucene. JCC, however, has 
 gotten over 2700 downloads in the past month:
  https://pypi.python.org/pypi/JCC/2.19
 
 I must also admit that I did never understand the voting process in detail -
 i.e. who are the PMC members and what impact have  votes of non PMC users.
 Maybe some more transparency and another call for action would help to
 raise awareness in the community.
 
 There are at least three classes in the Apache meritocracy:
  - users, developers, contributors but not committers
  - committers, ie developers who can commit patches to the project
  - PMC members, ie project committers that sit on the PMC (project
management committee)
 For more information, please see:
  https://www.apache.org/foundation/how-it-works.html
 
 By the rules guiding the release of Apache projects, three PMC votes are 
 necessary to release a tarball to the world.
 The list of Lucene committers is visible here:
  http://lucene.apache.org/whoweare.html
 Scroll down that list for the PMC membership.
 
 Andi..



installation succeeds but errors anyway

2014-04-16 Thread Aric Coady
Any idea what’s happening here at the end of make install?  Even though lucene 
was actually installed, make exits with an error intermittently.  As if it’s 
trying to download it from pypi afterwards?

…
Adding lucene 4.7.2 to easy-install.pth file

Installed 
.../lib/python2.7/site-packages/lucene-4.7.2-py2.7-macosx-10.9-x86_64.egg
Processing dependencies for lucene==4.7.2
Searching for lucene==4.7.2
Reading http://pypi.python.org/simple/lucene/
Couldn't find index page for 'lucene' (maybe misspelled?)
Scanning index of all packages (this may take a while)
Reading http://pypi.python.org/simple/
No local packages or download links found for lucene==4.7.2
error: Could not find suitable distribution for 
Requirement.parse('lucene==4.7.2')
make: *** [install] Error 1
Error: pylucene 4.7.2-1 did not build



Re: setAllowLeadingWildcard and PythonMultiFieldQueryParser

2010-08-19 Thread Aric Coady
On Aug 18, 2010, at 10:13 PM, Andi Vajda wrote:
 On Wed, 18 Aug 2010, Aric Coady wrote:
 #query = queryParser.parse(queryString)
 query = queryParser.parse(Version.LUCENE_CURRENT, queryString, fields,
 [BooleanClause.Occur.SHOULD, 
 BooleanClause.Occur.SHOULD],
 analyzer)
 
 Whenever there is a name conflict between a static and non-static method 
 detected by JCC, the static method wrapper is renamed to be suffixed with a 
 '_' and a warning is emitted by JCC.
 
 Does changing the code to use a parse_() method instead solve the problem ?
 (it's late here and I haven't tried it myself)

Ah, so there are couple different things going on here.  MultiFieldQueryParser 
has only static parse methods, except that it also inherits QueryParse.parse.  
Perhaps that's why JCC isn't supplying a parse_ method.

 lucene.MultiFieldQueryParser.parse
built-in method parse of type object at 0x10171d800
 lucene.MultiFieldQueryParser.parse_
Traceback (most recent call last):
  File stdin, line 1, in module
AttributeError: type object 'MultiFieldQueryParser' has no attribute 'parse_'
 lucene.QueryParser.parse
method 'parse' of 'QueryParser' objects

This gotcha has come up before:  
http://mail-archives.apache.org/mod_mbox/lucene-pylucene-dev/201007.mbox/%3caanlktinkhxsiqp7jljz1q0cy6cv03y5umyzvg8a5d...@mail.gmail.com%3e.
  But as known limitations go, it's an easy workaround.  Just call 
QueryParser.parse with the parser object as the first argument.

As for the wildcard issue, I was trying to point out that I don't think it's a 
pylucene problem at all.  The example given was calling the static 
MultiFieldQueryParser.parse with a parser object, incorrectly expecting 
settings on the parser object to have an affect.  The fact that calling 
queryParser.parse(queryString) raises a TypeError is technically unrelated, 
although probably adding to the confusion.



Re: API changes between 2.9.2 and 2.9.3

2010-07-21 Thread Aric Coady
On Jul 21, 2010, at 12:18 AM, Thomas Koch wrote:
 The question remains if it's feasible to support 2.x *and* 3.x  - as Bill
 mentioned ... I'd like to make it work on both. - me too.  I did fear that
 this makes things much more complicated and you end up with code if
 lucene.VERSION.split('.')[0]2: ... else ... - we did that some time ago
 during GCJ and JCC based versions of PyLucene, but at that time it was
 merely a matter of different imports and init stuff (initVM).
 
 But I understand now that as long as you remove deprecated code from 2.9 it
 *should* work with 2.9 and 3.0 as well! Right?

It's certainly possible, but there are some gotchas.  I've been maintaining 
2.4, 2.9, and 3.0 for my project (http://code.google.com/p/lupyne/), and just 
recently dropped 2.4 support.

The conditional checks that are still left involve the Python* overrides.  
There are several in 2.9 that still wrap the deprecated method or class, and of 
course they're missing in 3.0.  The ones I remember are PythonHitCollector, 
PythonFilter.bits, and PythonTokenFilter iteration.



Re: Problem getting tokens for document

2010-04-14 Thread Aric Coady
Hey, Herb.

There is a memory leak in the string array in pylucene 2.4.  In this case it 
would be the iteration of tfvP.getTerms().  The fix made it into 2.9, more 
history here:
http://mail-archives.apache.org/mod_mbox/lucene-pylucene-dev/200907.mbox/%3calpine.osx.2.01.0907301553230.5...@yuzu%3e

On Apr 14, 2010, at 10:21 AM, Herbert Roitblat wrote:

 Hi, folks.
 I am using PyLucene and doing a lot of get tokens.  lucene.py reports
 version 2.4.0.  It is rpath linux with 8GB of memory.  Python is 2.4.
 
 The system indexes 116,000 documents just fine.  
 
 Maxheap is '2048m', 64 bit environment.
 
 Then I need to get the tokens from these documents and near the end, I run
 into:
 
 java.lang.OutOfMemoryError: GC overhead limit exceeded
 
 The heap is apparently filling up with each document retrieved and never 
 getting cleared.  I was expecting that it would give me the information for 
 one document, then clear that and give me the info for another, etc.  I've 
 looked at it with jhat.
 
 I have tried deleting the Python objects that receive any information from 
 Lucene--no effect.
 I have tried reusing the Python objects that receive any information from 
 Lucene--no effect.
 I have tried running the Python garbage collector (it slowed the program 
 slightly, but generally no effect).
 
 Is there anything else I can do to get the tokens for a document and make 
 sure that this does not fill up the heap?  I need to be able to run a million 
 or more documents through this and get their tokens.
 
 
 Here is a code snippet.
 
reader = self.index.getReader()
lReader = reader.get()
searcher = self.index.getSearcher()
lSearcher = searcher.get()
query = lucene.TermQuery(lucene.Term(OTDocument.UID_FIELD_ID, uid))
hits = list(lSearcher.search(query))
if hits:
hit = lucene.Hit.cast_(hits[0])
tfvs = lReader.getTermFreqVectors(hit.id)
 
if tfvs is not None: # this happens if the vector is not stored
for tfv in tfvs: # There's one for each field that has a 
 TermFreqVector
tfvP = lucene.TermFreqVector.cast_(tfv)
if returnAllFields or tfvP.field in termFields: # add only 
 asked fields
tFields[tfvP.field] = dict([(t,f) for (t,f) in 
 zip(tfvP.getTerms(),tfvP.getTermFrequencies()) if f =minFreq])
else:
# This shouldn't happen, but we just log the error and march on
self.log.error(Unable to fetch doc %s from index%(uid))
 
lReader.close()
lSearcher.close()
 
 lReader is really:
 lucene.IndexReader.open(self._store)
 
 I've tried the Lucene list, but no one there has yet come up with a solution. 
  If filling the heap is a Lucene problem (is it a bug), I need to look for a 
 way to circumvent that bug.  
 
 Thanks, 
 
 Herb



Re: PyLucene 2.9.0 sources available for testing

2009-09-30 Thread Aric Coady

On Sep 29, 2009, at 3:10 PM, Andi Vajda wrote:
With the recent release of Java Lucene 2.9.0, a PyLucene 2.9.0  
release is in the works. I just completed the first rev of this and  
checked it into svn trunk. So far, I've only tested it on Mac OS X  
10.6 with 64-bit Python.


All unit tests pass as run with 'make test'.

If you're on a different platform and have some spare cycles, I'd be  
curious to see if all unit tests pass on your platform.


Unit tests passed for me (also on OS X though).  I did run into a  
problem with the MemoryIndex type missing.  It was a top level class  
in 2.4, and apparently the highlighter uses it.


import lucene
lucene.initVM(lucene.CLASSPATH)
print lucene.VERSION
print lucene.MemoryIndex  # AttributeError in 2.9



Re: initVM() crash and web.py

2009-06-15 Thread Aric Coady

On Jun 15, 2009, at 12:12 PM, Neha Gupta wrote:
The problem am having is that when I send a few requests one after  
the other
then the server crashes. I tried to put initVM() right after import  
lucene
statement at the top of the program but the crash still happens. I  
also read

this post:
http://lists.osafoundation.org/pipermail/pylucene-dev/2008-April/002634.html 
.http://lists.osafoundation.org/pipermail/pylucene-dev/2008-April/002634.html 

The problem seems familiar to mine, however, am not sure where  
should I put
initVM(). Should I also be calling env.attachCurrentThread()  
somewhere?


initVM must be called exactly once, so at the top of the program is a  
fine place for it.  attachCurrentThread must be called at least once  
per thread.  It is idempotent (and fast), so you can put it at the top  
of your request handler.


I prefer putting it in the threading code anyway, instead of  
cluttering the app.  My lupyne project has an example of that:  look  
for WorkerThread in http://code.google.com/p/lupyne/source/browse/trunk/lupyne/server.py 
.  LuPyne uses cherrypy, and web.py uses cherrypy's wsgiserver, so the  
code will translate.