I don't think Tesseract supports this. You may want to try to generate a 
text-only searchable PDF file and superimpose it on the original PDF file.

On Wednesday, November 11, 2020 at 10:25:07 AM UTC-6 jonas.pau...@gmail.com 
wrote:

> Hello.
>
> I've got some input document input.pdf. This comes straight from a scanner 
> and thus I do some preprocessing to improve accuracy (i.e., unpaper, 
> black/white, increased contrast), which yields preprocessed.png.
>
> When using the command
>
> tesseract preprocessed.png output pdf
>
> I receive a document, which has the ocr'ed text embedded. Great! However: 
> Can I tell tesseract to use the original document input.pdf as the 
> background (i.e., the one without preprocessing) of the generated PDF while 
> still performing ocr on the preprocessed input?
>
> Thanks,
> Jonas
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/6b1b45e3-367e-4395-a28d-742e2202c904n%40googlegroups.com.

Reply via email to