I have a Solr server indexes 2500 documents (up to 50MB each, ave 3MB) to Solr server. When running on Solr 4.0 I managed to finish index in 3 hours.
However after we upgrade to Solr 4.9, the index need 3 days to finish. I've done some profiling, numbers I get are: size figure of document, time for adding to Solr server (4.0), time for adding to Solr server (4.9) 1.18, 6 sec, 123 sec 2.26 12sec 444 sec 3.35 18sec over 600 sec 9.65 46sec timeout. >From what I can see index seems has an o(n) performance for Solr 4.0 and is >almost o(log n) for Solr 4.9. I also tried to comment out some copied fields >to narrow down the problem, seems size of the document after index(we copy >fields and the more fields we copy, the bigger the index size is) is the >dominating factor for index time. Just wondering has any one experience similar problem? Does that sound like a bug of Solr or just we have use Solr 4.9 wrong? Here is one example of field definition in my schema file. <fieldType name="text_stem" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <charFilter class="solr.HTMLStripCharFilterFactory"/> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="'+" replacement="" /> <!-- strip off all apostrophe (') characters --> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.ASCIIFoldingFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="../../resources/type-index-synonyms.txt"/> <filter class="solr.SnowballPorterFilterFactory" language="English" /> <!-- Used to have language="English" - seems this param is gone in 4.9 --> <filter class="solr.RemoveDuplicatesTokenFilterFactory" /> </analyzer> <analyzer type="query"> <charFilter class="solr.HTMLStripCharFilterFactory"/> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="'+" replacement="" /> <!-- strip off all apostrophe (') characters --> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.ASCIIFoldingFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="../../resources/type-query-colloq-synonyms.txt"/> <filter class="solr.SnowballPorterFilterFactory" language="English" /> <!-- Used to have language="English" - seems this param is gone in 4.9 --> <filter class="solr.RemoveDuplicatesTokenFilterFactory" /> </analyzer> </fieldType> Field: <field name="majorTextSignalStem" type="text_stem" indexed="true" stored="false" multiValued="true" omitNorms="false"/> Copy: <copyField dest="majorTextSignalStem" source="majorTextSignalRaw" /> Thanks, Ryan