Hi, Please try to submit a test case.
My guess is that this is related to bad /ToUnicode streams. Tilman Am 05.03.2020 um 03:09 schrieb Joel Hirsh:
I just started testing with version 2.0.19. I am using PDFTextStripper and some files that gave back fine results in 2.0.18 are completely useless with 2.0.19. As an example, I have one file that gets about 600 phrases in 2.0.18. In 2.0.19 it gets over 16,000 phrases the majority of which of are a zero length string, and most of the rest are single characters making up the phrase, rather than a phrase. The file is confidential, so I cannot just post it. Am I telling you something that you already know about, or should I try to submit a test case? Or is there some new option I am unaware of? Thanks
--------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org