Am 01.11.2019 um 15:53 schrieb John Lussmyer:
On Tue Oct 29 21:59:57 PDT 2019 thaush...@t-online.de said:
IIRC tesseract can do this. Not as annotation, but as invisible font.
As far as I can tell, it does it the same way that other programs do.
It's added to the content stream, mixed with all the commands for positioning, 
font size, etc...  Words are often broken up.
I'm looking for something that just embeds the plain text, with NO markup.


Then the best would be to use tesseract to do an ordinary text-only OCR, then use that text to create a text annotation. See the AddAnnotations.java example in the source code download.

Tilman


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Reply via email to