Re: Extreme index size reduction on 4.1-SNAPSHOT?

Shawn Heisey Tue, 27 Nov 2012 15:54:48 -0800

On 11/27/2012 2:25 PM, Markus Jelsma wrote:

Hi, please check this issue:
https://issues.apache.org/jira/browse/LUCENE-4226


But it is enabled because of:
https://issues.apache.org/jira/browse/LUCENE-4509

Since it's suddenly default you would have to completely wipe the index and 
reindex the data, at least i had to, because of numerous codec exceptions. It 
significantly reduced very large indexes we have.

I noticed the exceptions when I tried to restart after updating the.war. I stopped Solr, completely wiped out my data directories, and rana DIH full-import on all shards after starting back up. The almost 32percent drop in index size caught me off guard.

I had seen the compressed stored field issue come across dev andcommits, but I didn't connect the dots in my brain.

I would imagine that if Solr has to actually hit the disk, this will befaster, but if the data is already in the OS disk cache, it would beslower. I'm curious whether the document cache stores the compressed oruncompressed version. If it's the uncompressed version, the documentcache would get rid of any penalty.

Are there any config knobs for turning compression on/off, or changingthe compression algorithm? Are those knobs available to Solr? I'm notdoing anything on the scale of the Hathi Trust, but would I ever haveany reasonable need to change things?


Thanks,
Shawn

Re: Extreme index size reduction on 4.1-SNAPSHOT?

Reply via email to