Tesseract is OCR images not documents (pdf, docx, odt etc..) If you need multipage support use tif image format instead of pdf for scanning.
Zdenko so 28. 3. 2020 o 20:42 Essam Zaky <[email protected]> napísal(a): > What do you mean by "scan a pdf " ? > If you mean recognize pdf file , you can not recognize pdf file directly > because it's unsupported format by leptonica > see the following read me > https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc > > > The workarround is to find a tool which can extract pdf to images , then > write the extracted images paths in one text file > i.e. test.pdf will be > test.txt > ../image/path/1.png > ../image/path/2.png > ../image/path/3.png > > then call tesseract as follow > tesseract test.txt path/to/output -l eng > > > the output.txt will contain all the recognition result for all files in > test.txt > > > Best Regards > Essam > بتاريخ السبت، 28 مارس، 2020 8:48:20 م UTC+2، كتب Teo: >> >> Is there an option to directly scan a pdf document containing multiple >> pages instead of the single png image? >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/ffd9e7c7-8fdd-4ced-8707-eb6ceaf61b68%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/ffd9e7c7-8fdd-4ced-8707-eb6ceaf61b68%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xwcue9YezABmkrHX6AoB%3DdfsMvapKMiNT0tVQUBo-t_g%40mail.gmail.com.

