[poppler] Extracting Soft Hyphens from PDF

Mike Tonks Tue, 19 Jan 2010 02:04:05 -0800

Hi,

Does poppler support extraction / removal of soft hyphens (unicode
173) from PDF documents?


I am working on converting PDF documents to Ebook formats, and we need
to extract the text and formatting information to try to reflow the
document and create basic layout.

I find that pdftohtml for example inserts normal hyphens into the text
where the soft hyphen merely indicates the word was broken at a
suitable place, but should not appear in the text / html version of
the document.

Currently the only program I can find that extracts the text correctly
without hyphens is Adobe Acrobat Pro.


Thanks for any assistance,

Mike Tonks
_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

[poppler] Extracting Soft Hyphens from PDF

Reply via email to