On Fri, 30 Nov 2007, Felix Schwarz wrote:

Andi Vajda wrote:
The seqfault seems to be in testSimpleKeywordAnalyzer() before:
self.assertEqual(ts.next().termText(), input)
The program terminates immediately after ts.next().

Could it be that there is a mismatch in unicode char width between the python you compiled PyLucene with and the python you're running it with (which should be the same, really) ?

How can I check this?
I'm just using the Python which comes with CentOS 5 and did not modify
anything in PyLucene (besides some Makefile/setup.py stuff).

From the name of the function on the stack 'PyUnicodeUCS4_FromUnicode', it
could imply this.

To debug this, use gdb. You can recompile PyLucene with DEBUG=1 to disable optimizations and get a better gdb experience.

Edit JCCEnv.cpp and add:

printf("sizeof(Py_UNICODE) == sizeof(jchar): %d\n",
       sizeof(Py_UNICODE) == sizeof(jchar));

to the top of the JCCEnv::fromJString function and rebuild.
If it says '1' I suspect a problem because, unless I'm mistaken, the
PyUnicodeUCS4_FromUnicode expects 4-byte unicode chars yet Java's jchar is 2-byte. There are flavors of unicode chars in python: 2-byte wide and 4-byte wide.

Of course, I could be completely wrong and misleading you. Only stepping though gdb can actually tell.

Andi..
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to