On Thu, 5 Jan 2017, Kamesh Joshi wrote:
I am trying to parse the attached the pdf.but it does not give me the
places where the underline is present it just returns me plain text.
Please help me how can i also get the underline present in pdf or some way
to split text based on that.

I am using curl -T Downloads/kameshjoshi.pdf  http://localhost:9998/tika
--header "Accept: text/plain" in my command line.

You need to ask Tika to give you the HTML version to be able to spot markup like underlines. Swap that accept header to text/html and you should then be able to see them

Nick

Reply via email to