On Thu, Oct 25, 2012 at 1:35 AM, Dag Lem <[email protected]> wrote: > Even though ClusterSearcher is implemented in Perl, I don't see that > the C function top_docs would be calling back into Perl space here, > and thus I still don't understand the (big) discrepancy.
As a matter of fact, we **are** calling back into Perl-space. :) LucyX::Remote::ClusterSearcher is a pure-Perl subclass of Lucy::Search::Searcher, so it inherits all of Searcher's methods including hits(). The Perl-space function Lucy::Search::Searcher::hits is an XS wrapper around the Searcher_hits() C function I pointed you at earlier (<http://s.apache.org/vH >), which contains this line: TopDocs *top_docs = Searcher_Top_Docs(self, real_query, wanted, sort_spec); That `Searcher_Top_Docs()` call is actually a **method** invocation -- and since `self` isa LucyX::Remote::ClusterSearcher, the subroutine that gets dispatched is a callback to the pure Perl function LucyX::Remote::ClusterSearcher::top_docs. How does Lucy know about Perl-space subroutines, and how does the callback work? Well, Lucy is built on top of Clownfish, an object toolkit which is designed to facilitate things like this. You can write a pure-Perl subclass of a parent class which is implemented in C and it will Just Work -- which sure comes in handy for rapid prototyping! > In any case, just to rule out any *really* crazy stuff, I did the test > you suggested above. Here, top_docs() was a tiny bit faster than > hits(), as should be excpected. I have pasted the test program for > this at the end of this email. I peeked at Searcher.c and Lucy.xs to > work out the equivalent Perl code for hits(); I hope I got it right. I gave it a quick look-see, and your code looks like an accurate port -- kudos! What I was suggesting was something slightly different though: if ($top_docs) { my $top_docs = $searcher->top_docs(query => $real_query, num_wanted => $wanted); } else { $hits = $searcher->hits( query => $query, offset => $offset, num_wanted => $num_wanted, ); } I would not expect `hits()` to be faster in this case -- either for an IndexSearcher or a ClusterSearcher -- because `hits()` calls `top_docs()` as described above. Marvin Humphrey
