Andrzej Bialecki wrote:
Further input into this: after replacing the ConjunctionScorer with the
fixed version from JIRA, now the bottleneck seems to be ... in
Summarizer, of all things. :-)
While making the summarizer faster would of course be good, keep in mind
that the cost of summarizing te
Andrzej Bialecki wrote:
Hi,
I've been profiling a Nutch installation, and to my surprise the
largest amount of throwaway allocations and the most time spent was
not in Nutch specific code, or IPC, but in Lucene
ConjunctionScorer.doNext() method. This method operates on a
LinkedList, which s
You are right - it is still not committed but the patch is here:
http://issues.apache.org/jira/browse/LUCENE-443.
During tests of my patch - it was very,very similar to this one- I had up to
5% perfomance increase. But probably it will mainly result in nicer GC
behaviour.
Piotr
On 11/22/05, Andrz
Piotr Kosiorowski wrote:
On 11/22/05, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
Hi,
I've been profiling a Nutch installation, and to my surprise the largest
amount of throwaway allocations and the most time spent was not in Nutch
specific code, or IPC, but in Lucene ConjunctionScorer.doNe
On 11/22/05, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> I've been profiling a Nutch installation, and to my surprise the largest
> amount of throwaway allocations and the most time spent was not in Nutch
> specific code, or IPC, but in Lucene ConjunctionScorer.doNext() method.
> This m
Andrzej,
very interesting!!!
Nutch Summarizer also needlessly re-tokenizes the text over and
over again - perhaps it would be better to save already tokenized
text in parse_text, instead of the raw plain text? After all, the
only use for that text is to index it and then build the summaries