You need to use Tika up front to extract plain text from whatever your document format is (including tags in XML/HTML).
After that you send the clean text to Lucene for indexing... Mike McCandless http://blog.mikemccandless.com On Sun, Feb 24, 2013 at 12:58 AM, <[email protected]> wrote: > yeah, but when I googled or in stackoverflow.com, many many links confused > much. I didnt' read the books "lucene in action", "tika in action" yet. > > and how to index text with tags ?
