Hi, We have been using TIKA for sometime, which is very helpful, thanks a lot!

So far when TIKA extracted text, it throws away HTML link and only keep word, 
this is good for search indexing, but in new application we need to keep whole 
HTML link
when extracting text from a binary file like MS DOC, i could not find a simple 
way to do that, could you provide a pointer to suitable API or doc?

Thanks for helps again, Lisheng

Reply via email to