Re: my index has 500 million docs ,how to improve solr search performance？

Lance Norskog Wed, 17 Nov 2010 22:54:20 -0800

This is pretty standard. I think the problem is basic probabilities:when there are multiple shards, the query waits until the final shardresponds, then does another query which may wait for more than oneshard. The nature of probabilities is that there will be "stragglers"(late responses) and a long tail of response times by stragglers. Theresponse time for a single solr is like a raindrop: that is, the chartof response time (X) v.s. number of samples with that time (Y). Thecurve starts at the earliest possible search, zooms up, then rounds offto a long tail.

So the time for searching 100 shards on 10 machines is that curve timesten, that is, longer and flatter. Virtual machines in general do notgive solid consistent performance numbers. That there is no 'fairness'in dispatching searches, so some searches get good service and some bad.

Put these together (multiply the probability curves) and you will getreally variable response times. I don't know how to guide you.


lu.rongbin wrote:

thanks,Lance Norskog-2. I've tested the EBS, but it's not better. so ,maybe I
have to optimize my solr config for ec2 m2.4xlarge.this kind computer config
is :
   cpu units: 26 ECUs
   cpu cores: 8
   memery: 68G

----------------
solrconfig.xml content:

<?xml version="1.0" encoding="UTF-8" ?>

<config>


<abortOnConfigurationError>${solr.abortOnConfigurationError:true}</abortOnConfigurationError>

   <!-- there must be an index dir under this -->
   <indexDefaults>
    <!-- Values here affect all index writers and act as a default unless
overridden. -->
     <useCompoundFile>false</useCompoundFile>

     <mergeFactor>10</mergeFactor>
     <ramBufferSizeMB>32</ramBufferSizeMB>
     <maxFieldLength>10000</maxFieldLength>
     <writeLockTimeout>1000</writeLockTimeout>
     <commitLockTimeout>10000</commitLockTimeout>


     <!--
       This option specifies which Lucene LockFactory implementation to use.

       single = SingleInstanceLockFactory - suggested for a read-only index
                or when there is no possibility of another process trying
                to modify the index.
       native = NativeFSLockFactory  - uses OS native file locking
       simple = SimpleFSLockFactory  - uses a plain file for locking

       (For backwards compatibility with Solr 1.2, 'simple' is the default
        if not specified.)
     -->
     <lockType>native</lockType>
   </indexDefaults>

   <mainIndex>
     <!-- options specific to the main on-disk lucene index -->
     <useCompoundFile>false</useCompoundFile>
     <ramBufferSizeMB>32</ramBufferSizeMB>
     <mergeFactor>10</mergeFactor>
     <maxFieldLength>10000</maxFieldLength>
     <unlockOnStartup>false</unlockOnStartup>
     <reopenReaders>true</reopenReaders>
     <deletionPolicy class="solr.SolrDeletionPolicy">
       <str name="keepOptimizedOnly">false</str>
       <str name="maxCommitsToKeep">1</str>
     </deletionPolicy>

   </mainIndex>
   <jmx />

   <!-- Use the following format to specify a custom IndexReaderFactory -
allows for alternate
        IndexReader implementations.
   <indexReaderFactory name="IndexReaderFactory" class="package.class">
     Parameters as required by the implementation
   </indexReaderFactory>
   -->


   <query>
     <!-- Maximum number of clauses in a boolean query... can affect
         range or prefix queries that expand to big boolean
         queries.  An exception is thrown if exceeded.  -->
     <maxBooleanClauses>1024</maxBooleanClauses>
      <filterCache
       class="solr.FastLRUCache"
       size="5120"
       initialSize="512"
       autowarmCount="128"
       cleanupThread="true"/>

     <queryResultCache
       class="solr.FastLRUCache"
       size="20000"
       initialSize="10240"
       autowarmCount="320"
       cleanupThread="true"/>

     <documentCache
       class="solr.FastLRUCache"
       size="10240"
       initialSize="10240"
       autowarmCount="320"
       cleanupThread="true"/>

     <enableLazyFieldLoading>true</enableLazyFieldLoading>


     <queryResultWindowSize>20</queryResultWindowSize>

      <queryResultMaxDocsCached>20</queryResultMaxDocsCached>

     <HashDocSet maxSize="3000" loadFactor="0.75"/>

     <listener event="firstSearcher" class="solr.QuerySenderListener">
       <arr name="queries">
         <lst>  <str name="q">solr rocks</str><str name="start">0</str><str
name="rows">10</str></lst>
         <lst><str name="q">static firstSearcher warming query from
solrconfig.xml</str></lst>
       </arr>
     </listener>

     <useColdSearcher>false</useColdSearcher>
     <maxWarmingSearchers>2</maxWarmingSearchers>

   </query>

   <requestDispatcher handleSelect="true">
      <requestParsers enableRemoteStreaming="true"
multipartUploadLimitInKB="2048" />
     <httpCaching lastModifiedFrom="openTime"
                  etagSeed="Solr">
       </httpCaching>
   </requestDispatcher>


  <requestHandler name="standard" class="solr.SearchHandler" default="true">
     <!-- default values for query parameters -->
      <lst name="defaults">
        <str name="echoParams">explicit</str><!--
        <bool name="hl">true</bool>
        <str name="hl.fl">name</str>
        <int name="hl.snippets">1</int>
        <str name="hl.formatter">html</str>
        <str name="hl.fragsize">500</str>
        <str name="hl.simple.pre"><![CDATA[]]></str>
        <str name="hl.simple.post"><![CDATA[]]></str>
        <str name="fl">*</str>
        <int name="rows">10</int>
        <str name="version">2.1</str>
         -->
      </lst>
   </requestHandler>

-------------------------------------
schema.xml content:

<schema name="example" version="1.1">
   <types>
     <!-- solr.StrField:by default tokenized="false" -->
     <fieldType name="string" class="solr.StrField" sortMissingLast="true"
omitNorms="false" />
     <!-- solr.TextField:by default tokenized="true" -->
     <fieldType name="text" class="solr.TextField" omitNorms="false">
       <analyzer
class="org.apache.lucene.analysis.cn.smart.MySmartChineseAnalyzer" />
     </fieldType>
     <fieldType name="w_text" class="solr.TextField" omitNorms="false">
       <analyzer>
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
       </analyzer>
     </fieldType>
     <fieldType name="t_double" class="solr.TrieDoubleField"
precisionStep="8" omitNorms="false" positionIncrementGap="0" />
  </types>

  <fields>
    <field name="id" type="string" indexed="true" stored="true"
required="true" />
    <field name="price"  type="t_double" indexed="true" stored="true"
required="true" />
    <field name="name" type="text" indexed="true" stored="false"
required="true" />
    <field name="comefrom" type="string" indexed="true" stored="false"/>
    <field name="seller" type="string" indexed="true" stored="false" />
    <field name="category" type="string" indexed="true" stored="false" />
    <field name="detailpath" type="w_text" indexed="true" stored="false" />
  </fields>

  <uniqueKey>id</uniqueKey>
  <defaultSearchField>name</defaultSearchField>

I'm looking forward to your opinion

Re: my index has 500 million docs ,how to improve solr search performance？

Reply via email to