What do you mean by "scan a pdf " ?
If you mean recognize pdf file , you can not recognize pdf file directly
because it's unsupported format by leptonica
see the following read me
https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc
The workarround is to find a tool which can extract pdf to images , then
write the extracted images paths in one text file
i.e. test.pdf will be
test.txt
../image/path/1.png
../image/path/2.png
../image/path/3.png
then call tesseract as follow
tesseract test.txt path/to/output -l eng
the output.txt will contain all the recognition result for all files in
test.txt
Best Regards
Essam
بتاريخ السبت، 28 مارس، 2020 8:48:20 م UTC+2، كتب Teo:
>
> Is there an option to directly scan a pdf document containing multiple
> pages instead of the single png image?
>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/ffd9e7c7-8fdd-4ced-8707-eb6ceaf61b68%40googlegroups.com.