Re: Fwd: Pylucene jvm CharArraySet Error

Andi Vajda Sun, 09 Nov 2014 17:46:39 -0800


On Sun, 9 Nov 2014, Alexander Alex wrote:

Traceback (most recent call last):
 File "C:\index1.py", line 94, in <module>
   IndexFiles(sys.argv[1], os.path.join(base_dir, INDEX_DIR),
EnglishLemmaAnalyzer("english-bidirectional-distsim.tagger"))
 File "C:\index1.py", line 48, in __init__
   self.indexDocs(root, writer)
 File "C:\index1.py", line 81, in indexDocs
   writer.addDocument(doc)
JavaError: org.apache.jcc.PythonException: ('while calling', 'tokenStream',
<class '__main__.EnglishLemmaTokenizer'>)
TypeError: ('while calling', 'tokenStream', <class
'__main__.EnglishLemmaTokenizer'>)

   Java stacktrace:
org.apache.jcc.PythonException: ('while calling', 'tokenStream', <class
'__main__.EnglishLemmaTokenizer'>)
TypeError: ('while calling', 'tokenStream', <class
'__main__.EnglishLemmaTokenizer'>)


   at org.apache.pylucene.analysis.PythonAnalyzer.tokenStream(Native
Method)

   at
org.apache.lucene.analysis.Analyzer.reusableTokenStream(Analyzer.java:80)

   at
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:137)

   at
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:278)

   at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:766)

   at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2060)

   at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2034)


Then I tried to change the return object and runit as index2.py, again I
have the following errors:


Traceback (most recent call last):
 File "C:\newIndexfiles.py", line 94, in <module>
   IndexFiles(sys.argv[1], os.path.join(base_dir, INDEX_DIR),
EnglishLemmaAnalyzer("english-bidirectional-distsim.tagger"))
 File "C:\newIndexfiles.py", line 48, in __init__
   self.indexDocs(root, writer)
 File "C:\newIndexfiles.py", line 81, in indexDocs
   writer.addDocument(doc)
JavaError: java.lang.NullPointerException
   Java stacktrace:
java.lang.NullPointerException

   at
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:141)

   at
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:278)

   at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:766)

   at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2060)

   at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2034)


I cannot figure out the issues here. Thanks

Me neither. If you wrote new Java code you need to make sure it runs beforeyou wrap it with JCC. It's a lot easier to debug this way.


Andi..




On Sat, Oct 18, 2014 at 10:11 PM, Alexander Alex <
greatalexander4r...@gmail.com> wrote:

Thanks Andi. am going to try these suggestions out.

On Sat, Oct 18, 2014 at 9:55 PM, Andi Vajda <va...@apache.org> wrote:


On Sat, 18 Oct 2014, Alexander Alex wrote:

 The init file in the pylucene egg. Below is it:



import os, sys

if sys.platform == 'win32':
 from jcc.windows import add_jvm_dll_directory_to_path
 add_jvm_dll_directory_to_path()
 import jcc, _lucene
else:
 import _lucene

__dir__ = os.path.abspath(os.path.dirname(__file__))

class JavaError(Exception):
 def getJavaException(self):
   return self.args[0]
 def __str__(self):
   writer = StringWriter()
   self.getJavaException().printStackTrace(PrintWriter(writer))
   return "\n".join((super(JavaError, self).__str__(), "    Java
stacktrace:", str(writer)))

class InvalidArgsError(Exception):
 pass

_lucene._set_exception_types(JavaError, InvalidArgsError)

VERSION = "3.6.2"
CLASSPATH = [os.path.join(__dir__, "lucene-core-3.6.2.jar"),
os.path.join(__dir__, "lucene-analyzers-3.6.2.jar"),
os.path.join(__dir__,
"lucene-memory-3.6.2.jar"), os.path.join(__dir__,
"lucene-highlighter-3.6.2.jar"), os.path.join(__dir__,
"extensions.jar"),
os.path.join(__dir__, "lucene-queries-3.6.2.jar"), os.path.join(__dir__,
"lucene-grouping-3.6.2.jar"), os.path.join(__dir__,
"lucene-join-3.6.2.jar"), os.path.join(__dir__,
"lucene-facet-3.6.2.jar"),
os.path.join(__dir__, "lucene-spellchecker-3.6.2.jar")]
CLASSPATH = os.pathsep.join(CLASSPATH)
_lucene.CLASSPATH = CLASSPATH
_lucene._set_function_self(_lucene.initVM, _lucene)

from _lucene import *


Thanks. This looks like the vanilla __init__.py file in the pylucene egg.
I see no modifications from you for, I quote "path of the dependencies to
classpath in the init.py file".

To be sure there is no misunderstanding here, this is what I understand
from you so far:
  - you downloaded, built and installed PyLucene 3.6.2
    (with what Python version and what Java version ?)
  - you then compiled a new class and added it to two JAR files,
    lucene-core-3.6.2.jar and lucene-analyzers-3.6.2.jar
    (with that Java version ?, why did you modify two JAR files ?
     why not create your own JAR file with your extra stuff ?)
  - you then edited __init__.py to reflect this change but I don't see
    any change in the file you pasted nor why the change is needed if you
    just modified existing JAR files (in the right location, inside the
    PyLucene egg, right ?)
  - you did not rebuild PyLucene itself after making any of these changes

If this mental picture is correct then this is not the right way to go
about it. The proper way to modify Lucene Core and then PyLucene is to:
  - compile and build your new classes using the same version of Java (and
    Lucene)
  - create a new JAR file containing your extra stuff
  - test that it all works with a simple Java program that uses Lucene
core
    and your new code together
  - _then_ rebuild PyLucene including your new JAR file either by:
     - adding it to the list of JAR files being wrapped by JCC via --jar
       in the PyLucene Makefile
     - OR pass it to JCC via --include instead so that it just becomes
part
       of the new PyLucene egg (ensuring it being inside the egg and on
the
       classpath but no Python wrappers for it are generated)

To get command line argument help from JCC run python -m jcc --help (or
whatever the correct invocation is for your version of Python).

Andi..


 On Sat, Oct 18, 2014 at 12:29 AM, Andi Vajda <va...@apache.org> wrote:

On Sat, 18 Oct 2014, Alexander Alex wrote:

 ok. I built the class files for the java files attached herein, add
them

to
lucene-core-3.6.2.jar at org.apache.lucene.analysis and
lucene-analyzers-3.6.2.jar at org.apache.lucene.analysis. I then added
the
path of the dependencies to classpath in the init.py file.

What init.py file ?
Can you paste the contents of that file here, please ?

Andi..


 I ran the

typical index file using this customized analyzer through
PythonAnalyzer
and got the above error. Meanwhile, I had earlier ran the index file
using
standard analyzer before adding the classes and it worked. After
running
the index file with the customized analyzer failed, I tried again with
the
standard analyzer which had earlier worked before adding the classes
but
failed this time around with same error message as above. I guess the
problem has to do with array compatibility in java and python but I
don't
really know. Thanks.



On Fri, Oct 17, 2014 at 7:23 PM, Andi Vajda <va...@apache.org> wrote:


 On Fri, 17 Oct 2014, Alexander Alex wrote:


 Meanwhile, am using lucene 3.6.2 version. The problem is jvm
instantiation

 from any python code using lucene caused as a result of the classes I

added
to lucene core.

---------- Forwarded message ----------

I added a customized lucene analyzer class to lucene core in
Pylucene.


 Please explain in _detail_ the steps you followed to accomplish

this.
A log of all the commands you ran would be ideal.

Thanks !

Andi..


 This class is google guava as a dependency because of the array
handling

 function available in com.google.common.collect.Iterables in guava.

When
I tried to index using this analyzer, I got the following error:

Traceback (most recent call last): File "C:\IndexFiles.py", line 78,
in
lucene.initVM() JavaError: java.lang.NoClassDefFoundError:
org/apache/lucene/analysis/CharArraySet Java stacktrace:
java.lang.NoClassDefFoundError: org/apache/lucene/analysis/
CharArraySet
Caused by: java.lang.ClassNotFoundException:
org.apache.lucene.analysis.CharArraySet at
java.net.URLClassLoader$1.run(URLClassLoader.java:366) at
java.net.URLClassLoader$1.run(URLClassLoader.java:355) at
java.security.AccessController.doPrivileged(Native Method) at
java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
java.lang.ClassLoader.loadClass(ClassLoader.java:425) at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at
java.lang.ClassLoader.loadClass(ClassLoader.java:358)

Even the example indexing code in Lucene in Action that I tried
earlier
and
worked, when I retried it after adding this class is returning the
same
error above. Am not too familiar with CharArraySet class as I can see
the
problem is from it. How do i handle this? Attached is the java files
whose
class were added to lucene core in pylucene. Thanks

Re: Fwd: Pylucene jvm CharArraySet Error

Reply via email to