RE: Extreme index size reduction on 4.1-SNAPSHOT?

Markus Jelsma Tue, 27 Nov 2012 13:19:55 -0800

Hi, please check this issue:
https://issues.apache.org/jira/browse/LUCENE-4226


But it is enabled because of:
https://issues.apache.org/jira/browse/LUCENE-4509

Since it's suddenly default you would have to completely wipe the index and 
reindex the data, at least i had to, because of numerous codec exceptions. It 
significantly reduced very large indexes we have.
 
 
-----Original message-----
> From:Shawn Heisey <s...@elyograg.org>
> Sent: Tue 27-Nov-2012 22:16
> To: solr-user@lucene.apache.org
> Subject: Extreme index size reduction on 4.1-SNAPSHOT?
> 
> With a 4.1 snapshot from a couple of weeks ago, I saw about a 5% drop in 
> index size compared to 3.5.0 when using the same schema. When I updated 
> my 4.1 schema to ICUTokenizer so I could use CJKBigramFilter, my index 
> dropped further -- about 10% less than 3.5, still using the same 4.1 
> snapshot.
> 
> Yesterday I checked out the newest 4.1 snapshot and built the index 
> again.  Comparing a recently optimized 3.5.0 index with the same 
> recently optimized index under the new 4.1, I am seeing more than a 30 
> percent drop in size -- 15.49GB instead of 22.7 GB.  As noted above, 
> some of that drop can be explained by the change in schema, but not THAT 
> much.  I am very impressed.
> 
> Looking at the index directories from yesterday compared to what I 
> remember about the directories a couple of weeks ago, it appears that 
> some of the files that had Lucene40 in the filename now have Lucene41 in 
> the filename.
> 
> Is there any chance that this is an indication of a problem, or is the 
> expected index reduction really that good?
> 
> Thanks,
> Shawn
> 
>

RE: Extreme index size reduction on 4.1-SNAPSHOT?

Reply via email to