Re: PDFbox & soft hyphens

Tilman Hausherr Sat, 12 Aug 2023 07:12:51 -0700

On 12.08.2023 16:03, [email protected] wrote:

Hi all,
[PDFBOX-371] was about the treatment of soft hyphens by PDFbox in thecontext of extracting text from PDF.It looks like there is _no_ treatment of soft hyphens by PDFbox, atleast I did not found any information about it.Please prove me wrong or give me a hint how to get soft hyphens out ofa PDF as soft hyphens (which means as an "excentric" unicode or an"excentric" string).
Thanks
Walter Claassen



There were some issues over the years, see

https://issues.apache.org/jira/browse/TIKA-3314 (which I just resolvedbut was fixed long ago)


and

https://issues.apache.org/jira/browse/PDFBOX-5115

please test with the file there or with your own; if you're unsatisfied,upload it to a sharehoster and post the URL.


Tilman

Re: PDFbox & soft hyphens

Reply via email to