How to keep all HTML link when doing file content extraction?

Zhang, Lisheng Tue, 14 Feb 2017 16:43:11 -0800

Hi, We have been using TIKA for sometime, which is very helpful, thanks a lot!


So far when TIKA extracted text, it throws away HTML link and only keep word, 
this is good for search indexing, but in new application we need to keep whole 
HTML link
when extracting text from a binary file like MS DOC, i could not find a simple 
way to do that, could you provide a pointer to suitable API or doc?

Thanks for helps again, Lisheng

How to keep all HTML link when doing file content extraction?

Reply via email to