Hi all, I am using TIKA java library to read the content of some PDFs and it seems like it inserts some weird (hyphen-like) spacing. For example: The es tab lish ment of an in te grated Part ner Re la tion ship Man age ment (PRM) sys tem can po ten tially ad dress sev eral as pets
I tried to extract text from the same PDF using the pdftotext command line utility it extracts the text correctly: The establishment of an integrated Partner Relationship Management (PRM) system can potentially address several aspects Does somebody have any idea why TIKA behaves in this way and any tips to fixing it? Best regards, Augusto
