I have a Solr server  indexes 2500 documents (up to 50MB each, ave 3MB) to Solr 
server. When running on Solr 4.0 I managed to finish index in 3 hours.

However after we upgrade to Solr 4.9, the index need 3 days to finish.

I've done some profiling, numbers I get are:
size figure of document,    time for adding to Solr server (4.0), time for 
adding to Solr server (4.9)
1.18,                                   6 sec,                                  
                 123 sec
2.26                                   12sec                                    
               444 sec
3.35                                   18sec                                    
               over 600 sec
9.65                                    46sec                                   
               timeout.

>From what I can see index seems has an o(n) performance for Solr 4.0 and is 
>almost o(log n) for Solr 4.9. I also tried to comment out some copied fields 
>to narrow down the problem, seems size of the document after index(we copy 
>fields and the more fields we copy, the bigger the index size is)  is the 
>dominating factor for index time.

Just wondering has any one experience similar problem? Does that sound like a 
bug of Solr or just we have use Solr 4.9 wrong?

Here is one example of  field definition in my schema file.
        <fieldType name="text_stem" class="solr.TextField" 
positionIncrementGap="100">
            <analyzer type="index">
                <charFilter class="solr.HTMLStripCharFilterFactory"/>
                <charFilter class="solr.PatternReplaceCharFilterFactory" 
pattern="'+" replacement="" /> <!-- strip off all apostrophe (') characters -->
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.ASCIIFoldingFilterFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.SynonymFilterFactory" expand="true" 
ignoreCase="true" synonyms="../../resources/type-index-synonyms.txt"/>
                <filter class="solr.SnowballPorterFilterFactory" 
language="English" />
                <!-- Used to have  language="English" - seems this param is 
gone in 4.9 -->
                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
            </analyzer>
            <analyzer type="query">
                <charFilter class="solr.HTMLStripCharFilterFactory"/>
                <charFilter class="solr.PatternReplaceCharFilterFactory" 
pattern="'+" replacement="" /> <!-- strip off all apostrophe (') characters -->
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.ASCIIFoldingFilterFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.SynonymFilterFactory" expand="true" 
ignoreCase="true" synonyms="../../resources/type-query-colloq-synonyms.txt"/>
                <filter class="solr.SnowballPorterFilterFactory" 
language="English" />
                <!-- Used to have  language="English" - seems this param is 
gone in 4.9 -->
                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
            </analyzer>
        </fieldType>
Field:
<field name="majorTextSignalStem" type="text_stem" indexed="true" 
stored="false" multiValued="true" omitNorms="false"/>
Copy:
 <copyField dest="majorTextSignalStem" source="majorTextSignalRaw" />

Thanks,
Ryan

Reply via email to