Hi, Does poppler support extraction / removal of soft hyphens (unicode 173) from PDF documents?
I am working on converting PDF documents to Ebook formats, and we need to extract the text and formatting information to try to reflow the document and create basic layout. I find that pdftohtml for example inserts normal hyphens into the text where the soft hyphen merely indicates the word was broken at a suitable place, but should not appear in the text / html version of the document. Currently the only program I can find that extracts the text correctly without hyphens is Adobe Acrobat Pro. Thanks for any assistance, Mike Tonks _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
