Ok I think that it's a pdf generation module, because the txt is almost the same with the exception of some "the" which tesseract sees as "thè".
Il giorno mercoledì 25 marzo 2020 07:25:11 UTC+1, Essam Zaky ha scritto: > > You need to know which to improve tesserct engine or PDF generation > > so compare text file from abby and tesserct > if the result is highly different you need to improve image quality or > improve LSTM > > if the result of tesseract is good so you need to enhance the PDF > generation module > > بتاريخ الأربعاء، 25 مارس، 2020 7:04:14 ص UTC+2، كتب Teo: >> >> The quality is already very good, but is lower than abby finereader. In >> attachment there is a comparison between abby and gimagereader ocr, and you >> can see the difference. How we can improve it? >> >> >> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f0f76fd5-51fe-4b65-af63-04ba1bcebd97%40googlegroups.com.

