Hi Erick,
thank you for the reply.
Yes, I'm using the fast vector highlighter (Solr 4.3). Every request should
only deliver 10 results.
Here is my schema configuration on both field:
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.WordDelimiterFilterFactory"
catenateWords="1" catenateNumbers="1" catenateAll="1"
preserveOriginal="1" />
<filter class="solr.ASCIIFoldingFilterFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.ASCIIFoldingFilterFactory" />
</analyzer>
<analyzer type="multiterm">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.ASCIIFoldingFilterFactory" />
</analyzer>
</fieldType>
<fieldType name="textSpell" class="solr.TextField"
positionIncrementGap="100" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.SnowballPorterFilterFactory"
language="German2" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.StopFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
<filter class="solr.ShingleFilterFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.SnowballPorterFilterFactory"
language="German2" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.StandardFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
</analyzer>
<analyzer type="multiterm">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.ASCIIFoldingFilterFactory" />
</analyzer>
</fieldType>
<field name="spell" type="textSpell" indexed="true" multiValued="true" />
<field name="content" type="text" stored="true" indexed="true"
multiValued="true" termVectors="true" termPositions="true"
termOffsets="true" />
Field content contains in average around 5000 - 6000 words (only rough
estimation).
Best regards
Erwin
-----Original Message-----
From: Erick Erickson [mailto:[email protected]]
Sent: Tuesday, February 25, 2014 3:27 PM
To: [email protected]
Subject: Re: Performance problem on Solr query on stemmed values
Right, highlighting may have to re-analyze the input in order to return the
highlighted data. This will be significantly slower than the search,
especially if you have a large number of rows you're returning.
You can get better performance in highlighting by using
FastVectorHighlighter. See:
https://cwiki.apache.org/confluence/display/solr/FastVector+Highlighter
1000x is unusual, though, unless your fields are very large or you're
returning a lot of documents.
Best,
Erick
On Tue, Feb 25, 2014 at 5:23 AM, Erwin Gunadi <[email protected]>wrote:
> Hi,
>
>
>
> I would like to know whether anyone have experienced this kind of
> phenomena.
>
>
>
> We are having performance problem regarding query on stemmed value.
>
> I've documented the symptoms which I'm currently facing:
>
>
>
>
> Search on field content
>
> Search on field spell
>
> Highlighting (on content field)
>
> Processing speed
>
>
> active
>
> active
>
> Active
>
> Slow
>
>
> active
>
> not active
>
> Active
>
> Fast
>
>
> active
>
> active
>
> not active
>
> Fast
>
>
> not active
>
> active
>
> Active
>
> Slow
>
>
> not active
>
> active
>
> not active
>
> Fast
>
>
>
> *Fast means 1000x faster than "slow".
>
>
>
> Field Content is our index field, which holds original text, and spell
> is the field with stemmed value.
>
> According to my measurement result, search on both fields (stemmed and
> not
> stemmed) is really fast.
>
> But when I start to take highlighting into our query it takes too long
> to process.
>
>
>
> Best Regards
>
> Erwin
>
>