So, Helmut, PyLucene seems alive and kicking!

I liked the GCJ approach because of its:

- speed: although not a tenfold increase, it is significant to us
- ABI: 10-100 fold faster access to the Lucene API, from C++ that is.
- memory use: just less, and integrated, no separate heap etc

I do appreciate PyLucene's goal to support all of Lucene, and I agree
that maintaining all the proxies/stubs was painful.  In fact, they
caused most of the build/install trouble: I was astonished to learn
that gcj is able to compile lucene.jar with one simple instruction.
However, our goal here is almost the opposite: use the smallest
possible core of Lucene, that what it is good at, and no more.  Lucene
is darn good at ranked full-text/zoned based search, so we use that.
For faceted search, clustering, range search, n-grams etc we use other
software, and here comes the integration issue: the need for speed.

Generally speaking, crossing VM boundaries is extremely expensive,
mainly because of call dispatching and data conversions. Going from
Python to C++ and then to Java is crossing a VM boundary twice, first
using Python's C-API and then the JNI.  We avoid much of the C-API by
creating higher level C++ interfaces that just do more in a single
call, for example perform a query and retrieve the results.  Next we
avoided the JNI by using the ABI (formerly used by PyLucene as well).
This yielded a huge performance improvement. Multiple orders of
magnitude.

As a last remark, we do not use generated code for proxies.  Instead,
we use ctypes to interface to our C++ code, which than uses the ABI to
interface to Lucene.  This keeps our build process as simple as a few
single line compile statements.

Oh, for I forget: we don't use threads.

So, probably we used PyLucene for a specific feature of it that was
not expected, but I hope I made clear why we stick to GCJ.  And I
really think that if PyLucene is to cover all of Lucene, the JCC
approach is a good one, and I am glad to hear that it is stable.  I
will have to face the GCJ trouble on my own, if and when it appears, I
am afraid.

Erik


On Thu, Jan 15, 2009 at 9:12 PM, TJ Ninneman <t...@twopeasinabucket.com> wrote:
>
> On Jan 15, 2009, at 11:02 AM, Bill Janssen wrote:
>
>> Erik Groeneveld <e...@cq2.nl> wrote:
>>
>>> But I admit that after the major
>>> strategy change that involved using JCC instead of GCJ, I am
>>> switching
>>> to a different GCJ solution. Probably other do so as well?
>
> What solution?
>
>> Nope.  I dislike the JVM, particularly its handling of memory, so I
>> share your pain,
>
> Agreed, my memory consumption went up by almost a full order of
> magnitude.
>
> With that being said, the new JCC based one just rocks in almost every
> way.  Even when I would develop a pure python, multi-threaded server
> with GCJ PyLucene I invariably would have constant problems.  Now I
> can run my code within a rock solid mod_wsgi Apache server and I never
> have issues.
>
> It's a beautiful thing...RAM is cheap, downtime isn't.
>
> TJ
>
> _______________________________________________
> pylucene-dev mailing list
> pylucene-dev@osafoundation.org
> http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
>



-- 
E.J. Groeneveld
Seek You Too
twitter, skype: ejgroene
mobiel: 0624 584 029
_______________________________________________
pylucene-dev mailing list
pylucene-dev@osafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to