[tesseract-ocr] Re: Scan pdf file instead png

Essam Zaky Sat, 28 Mar 2020 12:43:00 -0700

What do you mean by "scan a pdf " ?
If you mean recognize pdf file , you can not recognize pdf file directly 
because it's unsupported format by leptonica
see the following read me
https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc


 
The workarround is to find a tool which can extract pdf to images , then 
write the extracted images  paths in one text file 
i.e. test.pdf will be
test.txt
     ../image/path/1.png
     ../image/path/2.png
     ../image/path/3.png

then call tesseract as follow
tesseract test.txt path/to/output -l eng 


the output.txt will contain all the recognition result for all files in 
test.txt


Best Regards
Essam
 بتاريخ السبت، 28 مارس، 2020 8:48:20 م UTC+2، كتب Teo:
>
> Is there an option to directly scan a pdf document containing multiple 
> pages instead of the single png image?
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/ffd9e7c7-8fdd-4ced-8707-eb6ceaf61b68%40googlegroups.com.

[tesseract-ocr] Re: Scan pdf file instead png

Reply via email to