OCR on PDFs

Peter Kronenberg Thu, 31 Dec 2020 06:58:48 -0800

I've got Tika working with Tesseract on PDF files, but it seems that if I give 
it a PDF file that has both searchable text and images, the text is OCRed 
twice.  Is there a way to avoid this?  Even if it has to make two passes, one 
for the straight text and then another for just the images

OCR on PDFs

Reply via email to