I've got Tika working with Tesseract on PDF files, but it seems that if I give it a PDF file that has both searchable text and images, the text is OCRed twice. Is there a way to avoid this? Even if it has to make two passes, one for the straight text and then another for just the images
- OCR on PDFs Peter Kronenberg
- Re: OCR on PDFs Nick Burch
- Re: OCR on PDFs Tim Allison
- RE: OCR on PDFs Peter Kronenberg
