Re: Indexing a document that modifies itself as it's being indexed

2014-03-11 Thread Stephen Green
Thanks, Mike. Once I was that deep in the guts of the indexer, I knew things were probably not going to go my way. I'll check out CachingTokenFilter. On Tue, Mar 11, 2014 at 3:09 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > You can't rely on how IndexWriter will iterate/consum

Re: Indexing a document that modifies itself as it's being indexed

2014-03-11 Thread Michael McCandless
You can't rely on how IndexWriter will iterate/consume those fields; that's an implementation detail. Maybe you could use CachingTokenFilter to pre-process the text fields and append the new fields? And then during indexing, replay the cached tokens, so you don't have to tokenize twice. Mike McC

Indexing a document that modifies itself as it's being indexed

2014-03-11 Thread Stephen Green
I'm working on a system that uses Lucene 4.6.0, and I have a couple of use cases for documents that modify themselves as they're being indexed. For example, we have text classifiers that we would like to run on the contents of certain fields. These classifiers produce field values (i.e., the clas