Piotr Kosiorowski wrote:
Hi, I started to think about implementing special kind of Lucene Query (if I remember correctly I would have to write my own Scorer and probably a few other classes) optimized for Nutch some time ago. I assumed having specialized query I would be able to avoid accessing some of lucene index structures multiple times as the same term apears many times in query generated by Nutch for multitoken queries. I am not an Lucene expert but maybe it is worth checking if it might give some performance boost. Has anyone any ideas why it might help or not?
That's a very good comment. Looking at the profile traces I can see that a lot of time is spent just juggling the sub-query scorers inside the BooleanScorer, and handling the complex query structure; if this part could be optimized by the use of a special scorer, it could be a big win.
-- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
