Here is the same question in stackOverflow for better format. http://stackoverflow.com/questions/42370231/solr- dynamic-field-blowing-up-the-index-size
Recently, I upgraded from solr 5.0 to solr 6.4.1. I can run my app fine but the problem is that index size with solr 6 is way too large. In solr 5, index size was about 15GB and in solr 6, for the same data, the index size is 300GB! I am not able to understand what contributes to such huge difference in solr 6. I have been able to identify a field which is blowing up the size of index. It is as follows. <dynamicField name="*_note" type="text_general" indexed="true" stored="true" multiValued="true" /> <field name="textproperty" type="text_general" indexed="true" stored="false" multiValued="true" /> <copyField source="*_note" dest="textproperty"/> When this field is commented out, the index size reduces to less than 10GB. This field is of type text_general. Following is the definition of this type. <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <charFilter class="solr.HTMLStripCharFilterFactory" /> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="((?m)[a-z]+)'s" replacement="$1s" /> <filter class="solr.WordDelimiterFilterFactory" protected="protwords.txt" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/> <filter class="solr.KStemFilterFactory" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="C:/Users/pratik/Desktop/solr-6.4.1_playground/solr-6.4.1/server/solr/collection1/conf/stopwords.txt" /> </analyzer> <analyzer type="query"> <charFilter class="solr.HTMLStripCharFilterFactory" /> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="((?m)[a-z]+)'s" replacement="$1s" /> <filter class="solr.WordDelimiterFilterFactory" protected="protwords.txt" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/> <filter class="solr.KStemFilterFactory" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="C:/Users/pratik/Desktop/solr-6.4.1_playground/solr-6.4.1/server/solr/collection1/conf/stopwords.txt" /> </analyzer> </fieldType> Few things which I did to debug this issue: - I have ensured that field type definition is same as what I was using in solr 5 and it is also valid in version 6. This field type considers a list of "stopwords" to be ignored during indexing. I have supplied the same list of stopwords which we were using in solr 5. I have verified that path of this file is correct and it is being loaded fine in solr admin UI. When I analyse these fields using "Analysis" tab of the solr admin UI, I can see that stopwords are being filtered out. However, when I query with some of these stopwords, I do get the results back which makes me think that probably stopwords are being indexed. Any idea what could increase the size of index by so much in solr 6?