I am running Tika-server-1.13 to extract text from a pdf file. Sometimes I am getting gibberish characters between words, it seems they are added to spacing between words or at the end of the file.
For two column pdf files, this is quite serious, adding too much gibberish. How can I get rid of this? Any suggestions are welcome. Allison
