Hi All, We've been experimenting with indexing the parsed content in Lucene and our initial attempt was to index the output from ToTextContentHandler.toString() as a Lucene Text field.
This is unlikely to be effective for large files. So I wonder what strategies exist for a more effective indexing/tokenization of the the possibly large content. Perhaps a custom ContentHandler can index content fragments in a unique Lucene field every time its characters(...) method is called, something I've been planning to experiment with. The feedback will be appreciated Cheers, Sergey
