I am running Tika-server-1.13 to extract text from a pdf file. Sometimes I
am getting gibberish characters between words, it seems they are added to
spacing between words or at the end of the file.

For two column pdf files, this is quite serious, adding too much gibberish.

How can I get rid of this? Any suggestions are welcome.

Allison

Reply via email to