I guess the first thing I'd do is to set "maxCollationTries" to zero.  This 
means it will only run your main query once and not re-run it to check the 
collations. Now see if your queries have consistent qtime.  One easy 
explanation is that with "maxCollationTries=10", it may be running your query 
up to 11 times to check up to 10 possible collations.  If the query takes 50ms 
by itself, then you've got 550ms total to not find spelling corrections.  
Unfortunately, the worst case here is the one that gives the user nothing back. 
 

Another thing to look at, with "maxCollationTries" at zero, set "maxCollations" 
to 10.  This will give you a list of the 10 collations it would have tried.  
You can figure if the one that gets hits is far enough down the list to explain 
the high total qtime when "maxCollationTries=10".  If this explains it, then 
the obvious solution is to set "maxCollationTries" to something lower than 10.  
(you'll need tio weigh how long you're willing to make your users wait to 
possibly get spelling suggestions)  Or possibly, use "spellcheck.q" to give it 
an easier query to evalutate than the main query (but that can still give valid 
collations). Also, see https://issues.apache.org/jira/browse/SOLR-3240 which is 
an optimization for this feature.

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: SandeepM [mailto:skmi...@hotmail.com] 
Sent: Thursday, April 18, 2013 11:33 PM
To: solr-user@lucene.apache.org
Subject: DirectSolrSpellChecker : vastly varying spellcheck QTime times.

Hi!

I am using SOLR 4.2.1.

My solrconfig.xml contains the following:

  <searchComponent name="MySpellcheck" class="solr.SpellCheckComponent">
       <str name="queryAnalyzerFieldType">text_spell</str>

     <lst name="spellchecker">
       <str name="name">MySpellchecker</str>
       <str name="field">spell</str>
       <str name="classname">solr.DirectSolrSpellChecker</str>
       <str name="distanceMeasure">internal</str>
       <float name="accuracy">0.5</float>
       <int name="maxEdits">2</int>
       <int name="minPrefix">1</int>
       <int name="maxInspections">5</int>
       <int name="minQueryLength">3</int>
       <float name="maxQueryFrequency">0.01</float>
       
     </lst>
 </searchComponent>

<requestHandler name="/select" class="solr.SearchHandler" startup="lazy">
    <lst name="defaults">
      <int name="rows">10</int>
      <str name="df">id</str>
      <str name="spellcheck.dictionary">MySpellchecker</str>
      <str name="spellcheck">on</str>
      <str name="spellcheck.extendedResults">false</str>
      <str name="spellcheck.count">10</str>
      <str name="spellcheck.alternativeTermCount">10</str>
      <str name="spellcheck.maxResultsForSuggest">35</str>
      <str name="spellcheck.onlyMorePopular">true</str>
      <str name="spellcheck.collate">true</str>
      <str name="spellcheck.collateExtendedResults">false</str>
      <str name="spellcheck.maxCollationTries">10</str>
      <str name="spellcheck.maxCollations">1</str>
      <str name="spellcheck.collateParam.q.op">AND</str>
    </lst>
    <arr name="last-components">
      <str>MySpellcheck</str>
    </arr>
  </requestHandler>

schema.xml with the spell field looks like:

                <fieldType name="text_spell" class="solr.TextField"
positionIncrementGap="100"  sortMissingLast="true" >
                        <analyzer type="index">
                                <tokenizer
class="solr.StandardTokenizerFactory" />
                                <filter class="solr.LowerCaseFilterFactory"
/>
                                <filter class="solr.StopFilterFactory"
ignoreCase="true"
                                         words="lang/stopwords_en.txt"
enablePositionIncrements="true" />
                        </analyzer>
                        <analyzer type="query">
                                <tokenizer
class="solr.StandardTokenizerFactory" />
                                <filter class="solr.LowerCaseFilterFactory"
/>
                                <filter class="solr.StopFilterFactory"
ignoreCase="true"
                                         words="lang/stopwords_en.txt"
enablePositionIncrements="true" />
                        </analyzer>
                </fieldType>

                <field name="spell" type="text_spell" indexed="true"
stored="false" multiValued="true" />

        <copyField source="title" dest="spell" />
        <copyField source="artist" dest="spell" />
 
My query:
http://host/solr/select?q=&spellcheck.q=chocolat%20factry&spellcheck=true&df=spell&fl=&indent=on&wt=xml&rows=10&version=2.2&echoParams=explicit

In this case, the intent is to correct "chocolat factry" with "chocolate
factory" which exists in my spell field index. I see a QTime from the above
query as somewhere between 350-400ms

I run a similar query replacing the spellcheck terms to "pursut hapyness"
whereas "pursuit happyness" actually exists in my spell field and I see
QTime of 15-17ms .

Both query produce collations correctly but there is order of magnitude
difference in QTime.  There is one edit per term in both cases or 2 edits in
each query. The length of words in both these queries seem identical. I'd
like to understand why there is this vast difference in QTime.  I would
appreciate any help with this since I am not sure how I can get any
meaningful performance numbers and attribute the slowness to anything in
particular. 

I also see a vast difference in QTime in another case.  Replace the search
terms in the above query with "over cuckoo's nest", "over cuccoo's nst",
etc.   "over cuckoo's nest" exists in my indexed spell field and so it
should find it almost immediately.  This query fails to produce any
collation and takes 10seconds. While the second query "over cuccoo's nst"
corrects the phrase and also returns in 24ms. Something does not sound right
here.

I would appreciate help with these.

Thanks in advance.
Regards,
-- Sandeep



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DirectSolrSpellChecker-vastly-varying-spellcheck-QTime-times-tp4057176.html
Sent from the Solr - User mailing list archive at Nabble.com.


Reply via email to