On 11/27/2012 2:25 PM, Markus Jelsma wrote:
Hi, please check this issue:
https://issues.apache.org/jira/browse/LUCENE-4226
But it is enabled because of:
https://issues.apache.org/jira/browse/LUCENE-4509
Since it's suddenly default you would have to completely wipe the index and
reindex the data, at least i had to, because of numerous codec exceptions. It
significantly reduced very large indexes we have.
I noticed the exceptions when I tried to restart after updating the
.war. I stopped Solr, completely wiped out my data directories, and ran
a DIH full-import on all shards after starting back up. The almost 32
percent drop in index size caught me off guard.
I had seen the compressed stored field issue come across dev and
commits, but I didn't connect the dots in my brain.
I would imagine that if Solr has to actually hit the disk, this will be
faster, but if the data is already in the OS disk cache, it would be
slower. I'm curious whether the document cache stores the compressed or
uncompressed version. If it's the uncompressed version, the document
cache would get rid of any penalty.
Are there any config knobs for turning compression on/off, or changing
the compression algorithm? Are those knobs available to Solr? I'm not
doing anything on the scale of the Hathi Trust, but would I ever have
any reasonable need to change things?
Thanks,
Shawn