Thanks for your help. how can i get the coordinates, and how do i check if they are correct?
Il giorno mercoledì 25 marzo 2020 10:41:07 UTC+1, Essam Zaky ha scritto: > > You need now to check the coordinates returned from tesseract ,use hocr > output and check if words coordinates are returned correctly if yes so it > is a bug in pdf generation > > if the coordinates are wrong it's bug in tesseract > > for me i used before library called itextsharp to generate searchable pdf > , the library ported from itext java library , it gives good pdf output > > > بتاريخ الأربعاء، 25 مارس، 2020 11:25:46 ص UTC+2، كتب Teo: >> >> Ok I think that it's a pdf generation module, because the txt is almost >> the same with the exception of some "the" which tesseract sees as "thè". >> >> Il giorno mercoledì 25 marzo 2020 07:25:11 UTC+1, Essam Zaky ha scritto: >>> >>> You need to know which to improve tesserct engine or PDF generation >>> >>> so compare text file from abby and tesserct >>> if the result is highly different you need to improve image quality or >>> improve LSTM >>> >>> if the result of tesseract is good so you need to enhance the PDF >>> generation module >>> >>> بتاريخ الأربعاء، 25 مارس، 2020 7:04:14 ص UTC+2، كتب Teo: >>>> >>>> The quality is already very good, but is lower than abby finereader. In >>>> attachment there is a comparison between abby and gimagereader ocr, and >>>> you >>>> can see the difference. How we can improve it? >>>> >>>> >>>> >>>> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b3293cc3-4766-4020-85b5-de6ad282aa6c%40googlegroups.com.

