On Tue, 17 Jun 2008, Felix Schwarz wrote:

Just a side note: I built RPMs for CentOS/Fedora which compile on CentOS 5 (i386, x86_64) and Fedora 9 using OpenJDK.

These RPMs do work well for me and I plan to submit them to Fedora but currently the rpath issue is the main blocker for inclusion. Currently I don't have the time to patch JCC to use only dlopen (which is the only accepted method besides waiting ~ 1/2 year so that OpenJDK /may/ get fixed):
http://thread.gmane.org/gmane.linux.redhat.fedora.devel/85542/focus=85548

Thank you for posting this thread, it was interesting reading.

The 'linking against the JRE' issue has been a mess. It's different on every platform (client vs server VM), every OS (only Mac OS X has it right at this time, linking against the system's JavaVM framework is done the same way on each and every OS X Mac), and even every Linux distro. No two OSs have the same path names or layouts for their JRE distro. The variety is quite impressive, it's almost as if this was done on purpose :)

Because of that, I've basically punted the automating of figuring out what the right paths are for each and every OS or Linux distro under the sun.

Python's distutils/setuptools is already doing a _very_ good job at invoking the C++ compiler and linker with all the right flags. One just has to get the paths rights (and the flags right, when doing a new port, such as for Solaris most recently).

It's been my experience so far that, since moving off of GCJ, most issues with building JCC-PyLucene have been with putting the right include and link paths into JCC's setup.py file. This requires an understanding of C++ compilers, linkers and shared libraries which is typically not needed or expected by software developers doing only Python or only Java development.

My sincere apologies to these developers.

That being said, I'd like to point out that PyLucene has come a long way since its beginnings in 2004. It's become vastly easier to build and use. It's also gotten quite stable.

PyLucene started as a bet that something like this could be done with GCJ and SWIG on Mac OS X, Linux and Windows. The bet proved workable for a while until SWIG and GCJ kept making it harder and harder.

SWIG, for instance, does not understand the concept of Java-masquerading-as-C++ that GCJ offers via CNI (which so much better than JNI and sorely missed). It became increasingly hard to keep up with its changes. The kludges I had to implement to get PyLucene to work had to change after every SWIG release. Eventually, I gave up and rewrote all the wrappers by hand (same was done for PyICU).

The GCJ part of the bet worked longer. Even though GCJ was never able to compile Java Lucene from sources without heavy patching of the Lucene Java sources and therefore required a JDK to build the Lucene sources, I was able to use GCJ for quite some years. The combination of the announcement of OpenJDK - and the corresponding drop of activity on the GCJ project (measured by the drop in traffic on the GCJ developer mailing list) and the worsening of support of GCJ 4.x on non-Linux (even more specifically non-Redhat) platforms eventually convinced me to give up on GCJ/CNI as well and move to a plain C++/JNI/JRE solution by writing a code generator that more or less replaces what is offered by GCJ's CNI.

I say 'more' because the generated C++ code and JCC take care of memory managing the objects returned by Java - CNI does not - and I say 'less' because CNI makes true C++ classes out of Java classes - JCC only generates C++ wrapper proxies. Last but not least, JCC has no knowledge of Java Lucene, it can be used with any Java code, just like CNI.

About the "use Jython instead, JCC is terrifying" point made in the thread you linked to:

  - There is nothing terrifying about JCC. It's just a C++ wrapper around
    a JVM. Yes, that may sound tricky but JCC is a code generator and the
    code it generates is quite legible, much more so than SWIG's, for
    example.
    I'd also argue that JCC is less terrifying than GCJ, especially on
    non Linux platforms. Ever built GCJ for Windows :) ?

  - The Java GC and Python GC are aware of each other in so far that
    neither will free an object held by the other runtime, even when in a
    circular reference situation as happens when one 'extends' a Java class
    with a Python class. Python's ref counting, JCC's proxies and Java's
    weak refs collaborate to make this work.

  - There is a big trade-off with using Jython, apart from the usual
    keeping-up-with-CPython point: when using Jython, you're a prisoner of
    the Java island again. Java very much wants to you to use Java and Java
    alone. While calling out of a Java VM via JNI is possible it's not
    part of the culture, or the way of doing things in the Java world. If
    everything you need for your project is available from the Java
    and Jython ecosystems then using Jython can be a very good choice. If
    that is not the case, the CPython/JCC/C++/JNI/JRE solution gives you
    the best of both worlds, you are not a prisoner of either.

    On a related note, in the next few weeks I intend to improve JCC so that
    it can work "in reverse". In other words, generate the code necessary to
    start from a Java process (instead of a CPython process) and invoke
    CPython code. This would make it possible to use CPython from Hadoop,
    for example. Adding support for Java 1.5 generics is also planned.

Further reading:
https://bugzilla.redhat.com/show_bug.cgi?id=449360
http://article.gmane.org/gmane.linux.redhat.fedora.devel/85542

Just in case anyone is interested, I can publish my RPMs somewhere.

If you send a message to this list detailing what distro(s) your RPMs can be used with, what version of Python, Lucene, Java, JCC and PyLucene they're built from and where they're hosted, I'll add a link to that message on PyLucene's homepage.

Thanks !

Andi..
_______________________________________________
pylucene-dev mailing list
pylucene-dev@osafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to