I'm working on a system that uses Lucene 4.6.0, and I have a couple of use
cases for documents that modify themselves as they're being indexed.
For example, we have text classifiers that we would like to run on the
contents of certain fields. These classifiers produce field values (i.e.,
the clas
You can't rely on how IndexWriter will iterate/consume those fields;
that's an implementation detail.
Maybe you could use CachingTokenFilter to pre-process the text fields
and append the new fields? And then during indexing, replay the
cached tokens, so you don't have to tokenize twice.
Mike McC
Thanks, Mike.
Once I was that deep in the guts of the indexer, I knew things were
probably not going to go my way.
I'll check out CachingTokenFilter.
On Tue, Mar 11, 2014 at 3:09 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> You can't rely on how IndexWriter will iterate/consum