If you run PDFBox app’s ExtractText on the files, are you getting the same 
output?  If so, might make sense to ask for help from the PDFBox project.

e.g. : http://apache.cs.utah.edu/pdfbox/2.0.2/pdfbox-app-2.0.2.jar

java -jar pdfbox-app-2.0.2.jar ExtractText thispdf.pdf

From: Allison A. [mailto:[email protected]]
Sent: Thursday, June 30, 2016 12:37 AM
To: [email protected]
Subject: Re: PDFPaser generates gibberish

I am running Tika-server-1.13 to extract text from a pdf file. Sometimes I am 
getting gibberish characters between words, it seems they are added to spacing 
between words or at the end of the file.

For two column pdf files, this is quite serious, adding too much gibberish.

How can I get rid of this? Any suggestions are welcome.

Allison

Reply via email to