I installed the 64-bit version of tesseract from UB Mannheim on my Win10 system but it will not read a PDF as the input "image".
Error messages: Tesseract Open Source OCR Engine v5.0.0-alpha.20191030 with Leptonica Error in pixReadStream: Pdf reading is not supported Error in pixRead: pix not read Error during processing. I have tried using the Xpdf command-line tool pdftotext for this task, but even the latest V4.02 of pdftotext fails to process some apparently invalid character maps (both LATIN1 and utf-8) for some PDF's I need converted to text. The PDF's are generated by a third party that I have no influence over to correct their PDF mistakes. I was hoping tesseract might do a better job for my PDF-to-text need. TIA for any info or suggestions you can provide. Peter -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3acec554-e508-4759-8a46-9ab7e1bb6e6f%40googlegroups.com.

