read this document https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage
the following command can return the coordinates tesseract testing/eurotext.png testing/eurotext-eng -l eng hocr hocr contain the word as a text and coordinate you can open the image in any image editor such as MSpaint and check the returned coordinates represent the word in images Best Regards بتاريخ الخميس، 26 مارس، 2020 1:10:22 م UTC+2، كتب Teo: > > Thanks for your help. how can i get the coordinates, and how do i check if > they are correct? > > Il giorno mercoledì 25 marzo 2020 10:41:07 UTC+1, Essam Zaky ha scritto: >> >> You need now to check the coordinates returned from tesseract ,use hocr >> output and check if words coordinates are returned correctly if yes so it >> is a bug in pdf generation >> >> if the coordinates are wrong it's bug in tesseract >> >> for me i used before library called itextsharp to generate searchable pdf >> , the library ported from itext java library , it gives good pdf output >> >> >> بتاريخ الأربعاء، 25 مارس، 2020 11:25:46 ص UTC+2، كتب Teo: >>> >>> Ok I think that it's a pdf generation module, because the txt is almost >>> the same with the exception of some "the" which tesseract sees as "thè". >>> >>> Il giorno mercoledì 25 marzo 2020 07:25:11 UTC+1, Essam Zaky ha scritto: >>>> >>>> You need to know which to improve tesserct engine or PDF generation >>>> >>>> so compare text file from abby and tesserct >>>> if the result is highly different you need to improve image quality or >>>> improve LSTM >>>> >>>> if the result of tesseract is good so you need to enhance the PDF >>>> generation module >>>> >>>> بتاريخ الأربعاء، 25 مارس، 2020 7:04:14 ص UTC+2، كتب Teo: >>>>> >>>>> The quality is already very good, but is lower than abby finereader. >>>>> In attachment there is a comparison between abby and gimagereader ocr, >>>>> and >>>>> you can see the difference. How we can improve it? >>>>> >>>>> >>>>> >>>>> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/cae9a132-fb12-4512-bc3f-79c2d948a615%40googlegroups.com.

