[pylucene-dev] Re: seqfault in KeywordAnalyzerTest with jcc-enabled PyLucene

Andi Vajda Fri, 30 Nov 2007 08:51:41 -0800


On Fri, 30 Nov 2007, Felix Schwarz wrote:

Andi Vajda wrote:
The seqfault seems to be in testSimpleKeywordAnalyzer() before:
self.assertEqual(ts.next().termText(), input)
The program terminates immediately after ts.next().
Could it be that there is a mismatch in unicode char width between thepython you compiled PyLucene with and the python you're running it with(which should be the same, really) ?
How can I check this?
I'm just using the Python which comes with CentOS 5 and did not modify
anything in PyLucene (besides some Makefile/setup.py stuff).
From the name of the function on the stack 'PyUnicodeUCS4_FromUnicode', it
could imply this.
To debug this, use gdb. You can recompile PyLucene with DEBUG=1 to disableoptimizations and get a better gdb experience.


Edit JCCEnv.cpp and add:

printf("sizeof(Py_UNICODE) == sizeof(jchar): %d\n",
       sizeof(Py_UNICODE) == sizeof(jchar));

to the top of the JCCEnv::fromJString function and rebuild.
If it says '1' I suspect a problem because, unless I'm mistaken, the

PyUnicodeUCS4_FromUnicode expects 4-byte unicode chars yet Java's jchar is2-byte. There are flavors of unicode chars in python: 2-byte wide and 4-bytewide.

Of course, I could be completely wrong and misleading you. Only steppingthough gdb can actually tell.


Andi..
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

[pylucene-dev] Re: seqfault in KeywordAnalyzerTest with jcc-enabled PyLucene

Reply via email to