You need to use Tika up front to extract plain text from whatever your
document format is (including tags in XML/HTML).

After that you send the clean text to Lucene for indexing...

Mike McCandless

http://blog.mikemccandless.com

On Sun, Feb 24, 2013 at 12:58 AM,  <[email protected]> wrote:
> yeah, but when I googled or in stackoverflow.com, many many links confused
> much. I didnt' read the books "lucene in action", "tika in action" yet.
>
> and how to index text with tags ?

Reply via email to