Re: Indexing size increase 20% after switching from lucene 4.4 to 4.5 or 4.8 with BinaryDocValuesField

2014-06-14 Thread Robert Muir
They are still encoded the same way: so likely you arent testing apples to apples (e.g. different number of segments or whatever). On Fri, Jun 13, 2014 at 8:28 PM, Zhao, Gang wrote: > > > I used lucene 4.4 to create index for some documents. One of the indexing > fields is BinaryDocValuesField.

RE: Indexing size increase 20% after switching from lucene 4.4 to 4.5 or 4.8 with BinaryDocValuesField

2014-06-14 Thread Zhao, Gang
Hi Robert, Thank you for your reply! I used the same data set for both versions. There are mainly two changes: 1. Before package com.ea.eadp.data.aem.audience.indexer.data.extension; import com.ea.eadp.data.aem.audience.shared.IndexField; import org.apache.lucene.codecs.Codec; import org.

Hunspell low level interface in Lucene 4.8

2014-06-14 Thread Michal Lopuszynski
Dear all, I am not much into searching, however, I used Lucene to do some text postprocessing, (esp. stemming) using low level tools generously gathered in Lucene. I was very happy to see the memory footprint improvement in the Hunspell stemmer algorithm (https://issues.apache.org/jira/browse/LU

Re: Facets in Lucene 4.7.2

2014-06-14 Thread Shai Erera
Hi Currently there's now way to add e.g. terms to already indexed documents, you have to re-index them. The only updatable field type Lucene offers currently are DocValues fields. If the list of markers/flags is fixed in your case, and you can map them to an integer, I think you could use a Numeri