Hi, please check this issue: https://issues.apache.org/jira/browse/LUCENE-4226
But it is enabled because of: https://issues.apache.org/jira/browse/LUCENE-4509 Since it's suddenly default you would have to completely wipe the index and reindex the data, at least i had to, because of numerous codec exceptions. It significantly reduced very large indexes we have. -----Original message----- > From:Shawn Heisey <s...@elyograg.org> > Sent: Tue 27-Nov-2012 22:16 > To: solr-user@lucene.apache.org > Subject: Extreme index size reduction on 4.1-SNAPSHOT? > > With a 4.1 snapshot from a couple of weeks ago, I saw about a 5% drop in > index size compared to 3.5.0 when using the same schema. When I updated > my 4.1 schema to ICUTokenizer so I could use CJKBigramFilter, my index > dropped further -- about 10% less than 3.5, still using the same 4.1 > snapshot. > > Yesterday I checked out the newest 4.1 snapshot and built the index > again. Comparing a recently optimized 3.5.0 index with the same > recently optimized index under the new 4.1, I am seeing more than a 30 > percent drop in size -- 15.49GB instead of 22.7 GB. As noted above, > some of that drop can be explained by the change in schema, but not THAT > much. I am very impressed. > > Looking at the index directories from yesterday compared to what I > remember about the directories a couple of weeks ago, it appears that > some of the files that had Lucene40 in the filename now have Lucene41 in > the filename. > > Is there any chance that this is an indication of a problem, or is the > expected index reduction really that good? > > Thanks, > Shawn > >