1 000 000 pages in one pdf? Seriously? + Post your code. pytesseract is not effective tool in case of multiple images (disk IO for each run/page)
Zdenko št 25. 3. 2021 o 8:49 Vidya Chitragar < [email protected]> napísal(a): > Hi Every one. > I am using pytesseract with tesseract-ocr version 3.05.02 for conversion > of scanned pdf document of 1000k pages to searchable pdf document but my > code is taking more than 5 to 6 hrs to give searcable pdf document , Any > suggestions are very helpful to me > Thanks, > Vidya > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/8f2fe788-c28f-40f7-9804-99978cb44353n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/8f2fe788-c28f-40f7-9804-99978cb44353n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8yfDieVTBqLtngKSgHAY3giX5rYxmvC8S_0sDro9bgmjg%40mail.gmail.com.

