Hello!

I was wondering if there is a way to instruct Tika Server to extract content 
only with in a div tag.

I am extracting a Sharepoint site and do not want to see text from header, 
footer etc. The important text is always inside a particular content div. I 
only want text from inside that div.

Previously, I had switched to using the /tika/main endpoint. While this has 
definitely given us some improvement, there are still many cases where text 
from the header is also extracted.

Thanks!
Harinder

________________________________
NOTICE -
This communication is intended ONLY for the use of the person or entity named 
above and may contain information that is confidential or legally privileged. 
If you are not the intended recipient named above or a person responsible for 
delivering messages or communications to the intended recipient, YOU ARE HEREBY 
NOTIFIED that any use, distribution, or copying of this communication or any of 
the information contained in it is strictly prohibited. If you have received 
this communication in error, please notify us immediately by telephone and then 
destroy or delete this communication, or return it to us by mail if requested 
by us. The City of Calgary thanks you for your attention and co-operation.

Reply via email to