I don't think Tesseract supports this. You may want to try to generate a text-only searchable PDF file and superimpose it on the original PDF file.
On Wednesday, November 11, 2020 at 10:25:07 AM UTC-6 jonas.pau...@gmail.com wrote: > Hello. > > I've got some input document input.pdf. This comes straight from a scanner > and thus I do some preprocessing to improve accuracy (i.e., unpaper, > black/white, increased contrast), which yields preprocessed.png. > > When using the command > > tesseract preprocessed.png output pdf > > I receive a document, which has the ocr'ed text embedded. Great! However: > Can I tell tesseract to use the original document input.pdf as the > background (i.e., the one without preprocessing) of the generated PDF while > still performing ocr on the preprocessed input? > > Thanks, > Jonas > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/6b1b45e3-367e-4395-a28d-742e2202c904n%40googlegroups.com.