Re: Performance issues with ConjunctionScorer

2005-11-22 Thread Stefan Groschupf
Andrzej, very interesting!!! Nutch Summarizer also needlessly re-tokenizes the text over and over again - perhaps it would be better to save already tokenized text in parse_text, instead of the raw plain text? After all, the only use for that text is to index it and then build the

Re: Performance issues with ConjunctionScorer

2005-11-22 Thread Piotr Kosiorowski
On 11/22/05, Andrzej Bialecki [EMAIL PROTECTED] wrote: Hi, I've been profiling a Nutch installation, and to my surprise the largest amount of throwaway allocations and the most time spent was not in Nutch specific code, or IPC, but in Lucene ConjunctionScorer.doNext() method. This method

Re: Performance issues with ConjunctionScorer

2005-11-22 Thread Andrzej Bialecki
Piotr Kosiorowski wrote: On 11/22/05, Andrzej Bialecki [EMAIL PROTECTED] wrote: Hi, I've been profiling a Nutch installation, and to my surprise the largest amount of throwaway allocations and the most time spent was not in Nutch specific code, or IPC, but in Lucene

Re: Performance issues with ConjunctionScorer

2005-11-22 Thread Piotr Kosiorowski
You are right - it is still not committed but the patch is here: http://issues.apache.org/jira/browse/LUCENE-443. During tests of my patch - it was very,very similar to this one- I had up to 5% perfomance increase. But probably it will mainly result in nicer GC behaviour. Piotr On 11/22/05,

Re: Performance issues with ConjunctionScorer

2005-11-22 Thread Doug Cutting
Andrzej Bialecki wrote: Further input into this: after replacing the ConjunctionScorer with the fixed version from JIRA, now the bottleneck seems to be ... in Summarizer, of all things. :-) While making the summarizer faster would of course be good, keep in mind that the cost of summarizing