On Thu, 5 Jan 2017, Kamesh Joshi wrote:
I am trying to parse the attached the pdf.but it does not give me the
places where the underline is present it just returns me plain text.
Please help me how can i also get the underline present in pdf or some way
to split text based on that.
I am using curl -T Downloads/kameshjoshi.pdf http://localhost:9998/tika
--header "Accept: text/plain" in my command line.
You need to ask Tika to give you the HTML version to be able to spot
markup like underlines. Swap that accept header to text/html and you
should then be able to see them
Nick