So, Helmut, PyLucene seems alive and kicking! I liked the GCJ approach because of its:
- speed: although not a tenfold increase, it is significant to us - ABI: 10-100 fold faster access to the Lucene API, from C++ that is. - memory use: just less, and integrated, no separate heap etc I do appreciate PyLucene's goal to support all of Lucene, and I agree that maintaining all the proxies/stubs was painful. In fact, they caused most of the build/install trouble: I was astonished to learn that gcj is able to compile lucene.jar with one simple instruction. However, our goal here is almost the opposite: use the smallest possible core of Lucene, that what it is good at, and no more. Lucene is darn good at ranked full-text/zoned based search, so we use that. For faceted search, clustering, range search, n-grams etc we use other software, and here comes the integration issue: the need for speed. Generally speaking, crossing VM boundaries is extremely expensive, mainly because of call dispatching and data conversions. Going from Python to C++ and then to Java is crossing a VM boundary twice, first using Python's C-API and then the JNI. We avoid much of the C-API by creating higher level C++ interfaces that just do more in a single call, for example perform a query and retrieve the results. Next we avoided the JNI by using the ABI (formerly used by PyLucene as well). This yielded a huge performance improvement. Multiple orders of magnitude. As a last remark, we do not use generated code for proxies. Instead, we use ctypes to interface to our C++ code, which than uses the ABI to interface to Lucene. This keeps our build process as simple as a few single line compile statements. Oh, for I forget: we don't use threads. So, probably we used PyLucene for a specific feature of it that was not expected, but I hope I made clear why we stick to GCJ. And I really think that if PyLucene is to cover all of Lucene, the JCC approach is a good one, and I am glad to hear that it is stable. I will have to face the GCJ trouble on my own, if and when it appears, I am afraid. Erik On Thu, Jan 15, 2009 at 9:12 PM, TJ Ninneman <t...@twopeasinabucket.com> wrote: > > On Jan 15, 2009, at 11:02 AM, Bill Janssen wrote: > >> Erik Groeneveld <e...@cq2.nl> wrote: >> >>> But I admit that after the major >>> strategy change that involved using JCC instead of GCJ, I am >>> switching >>> to a different GCJ solution. Probably other do so as well? > > What solution? > >> Nope. I dislike the JVM, particularly its handling of memory, so I >> share your pain, > > Agreed, my memory consumption went up by almost a full order of > magnitude. > > With that being said, the new JCC based one just rocks in almost every > way. Even when I would develop a pure python, multi-threaded server > with GCJ PyLucene I invariably would have constant problems. Now I > can run my code within a rock solid mod_wsgi Apache server and I never > have issues. > > It's a beautiful thing...RAM is cheap, downtime isn't. > > TJ > > _______________________________________________ > pylucene-dev mailing list > pylucene-dev@osafoundation.org > http://lists.osafoundation.org/mailman/listinfo/pylucene-dev > -- E.J. Groeneveld Seek You Too twitter, skype: ejgroene mobiel: 0624 584 029 _______________________________________________ pylucene-dev mailing list pylucene-dev@osafoundation.org http://lists.osafoundation.org/mailman/listinfo/pylucene-dev