When using a HitCollector via PyLucene (i.e.,
overiding the collect() API) has anybody else noticed
a massive slowdown?  

Even if i set my collector to return immediately in
the collect(doc_id, score) callback, so not even
touching any of the ids or scores, on a collection of
540,000 documents, I get a an avg search time of .11
seconds.  

If I go through the standard IndexSearcher.search,
which still uses a hit collector on the Java backside
(TopDocCollector.java if interested), I get avg search
times of 0.0104 -- and it is actually doing something
(namely, tracking the highest scored docs in a
priority queue up to size 100).

Is this order-of-magnitude slowdown something that I
can expect just because of the java->python callback
via the collect() function?  

To get this up to speed, is my only option to code my
collector in Java, add in the hooks, then compile a
custom (gulp) PyLucene version? 

Scared (and thanks)
j


       
____________________________________________________________________________________Ready
 for the edge of your seat? 
Check out tonight's top picks on Yahoo! TV. 
http://tv.yahoo.com/
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to