Re: [pylucene-dev] HitCollector in PyLucene extremely slow

Andi Vajda Mon, 03 Sep 2007 20:49:43 -0700


On Mon, 3 Sep 2007, John Kleven wrote:

When using a HitCollector via PyLucene (i.e.,
overiding the collect() API) has anybody else noticed
a massive slowdown?

Even if i set my collector to return immediately in
the collect(doc_id, score) callback, so not even
touching any of the ids or scores, on a collection of
540,000 documents, I get a an avg search time of .11
seconds.

If I go through the standard IndexSearcher.search,
which still uses a hit collector on the Java backside
(TopDocCollector.java if interested), I get avg search
times of 0.0104 -- and it is actually doing something
(namely, tracking the highest scored docs in a
priority queue up to size 100).

Is this order-of-magnitude slowdown something that I
can expect just because of the java->python callback
via the collect() function?

To get this up to speed, is my only option to code my
collector in Java, add in the hooks, then compile a
custom (gulp) PyLucene version?

I don't know enough about what you're trying to do to have much of an opinion.It seems to me though, that you're comparing apples and oranges. In thepython case you're using a HitCollector python customization that returnsnothing and in the Java case you're using a TopDocCollector that actuallydoes something.

If indeed it turns out that calling into Python Java is the culprit, then yourbest bet is what you're suggesting.I doubt it, though. The only possibly expensive call apart from your pythoncode is the acquiring of the python GIL (Global Interpreter Lock). If there isno contention for the GIL, it should be really fast acquiring it.

The rest of the Java->Python boundary crossing is the marshalling of Javaobjects into Python ones, the call to your method itself and the reversemarshalling of the return value.


Andi..
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Re: [pylucene-dev] HitCollector in PyLucene extremely slow

Reply via email to