PLease attach the original image to check on my machine بتاريخ السبت، 28 مارس، 2020 7:24:07 م UTC+2، كتب Teo: > > Thanks for the reply. > I just opened an issue on github/Tesseract. Then I tried to create an pdf > only with tesseract and without gimagereader with: > tesseract pho.png pho-eng -l eng pdf > but this is the result... > > > Il giorno venerdì 27 marzo 2020 03:13:40 UTC+1, Essam Zaky ha scritto: >> >> So I guess the error in PDF generation module >> you have one of the following option >> -try to enhance the bug by your self >> -raise an issue in Tesseract issues , but check first that the issue is >> not exist in list of issues >> -Use other extrenal library to create searchable pdf depending on hocr >> >> before tesseract add feature of generating pdf i used library called >> itextsharp to generate the pdf and the result was very good for me >> >> بتاريخ الخميس، 26 مارس، 2020 10:54:50 م UTC+2، كتب Teo: >>> >>> Ok coordinates seem correct. >>> >>> Il giorno giovedì 26 marzo 2020 19:13:52 UTC+1, Essam Zaky ha scritto: >>>> >>>> read this document >>>> https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage >>>> >>>> the following command can return the coordinates >>>> >>>> tesseract testing/eurotext.png testing/eurotext-eng -l eng hocr >>>> >>>> >>>> hocr contain the word as a text and coordinate >>>> you can open the image in any image editor such as MSpaint and check >>>> the returned coordinates represent the word in images >>>> >>>> Best Regards >>>> >>>> بتاريخ الخميس، 26 مارس، 2020 1:10:22 م UTC+2، كتب Teo: >>>>> >>>>> Thanks for your help. how can i get the coordinates, and how do i >>>>> check if they are correct? >>>>> >>>>> Il giorno mercoledì 25 marzo 2020 10:41:07 UTC+1, Essam Zaky ha >>>>> scritto: >>>>>> >>>>>> You need now to check the coordinates returned from tesseract ,use >>>>>> hocr output and check if words coordinates are returned correctly if yes >>>>>> so >>>>>> it is a bug in pdf generation >>>>>> >>>>>> if the coordinates are wrong it's bug in tesseract >>>>>> >>>>>> for me i used before library called itextsharp to generate searchable >>>>>> pdf , the library ported from itext java library , it gives good pdf >>>>>> output >>>>>> >>>>>> >>>>>> بتاريخ الأربعاء، 25 مارس، 2020 11:25:46 ص UTC+2، كتب Teo: >>>>>>> >>>>>>> Ok I think that it's a pdf generation module, because the txt is >>>>>>> almost the same with the exception of some "the" which tesseract sees >>>>>>> as >>>>>>> "thè". >>>>>>> >>>>>>> Il giorno mercoledì 25 marzo 2020 07:25:11 UTC+1, Essam Zaky ha >>>>>>> scritto: >>>>>>>> >>>>>>>> You need to know which to improve tesserct engine or PDF generation >>>>>>>> >>>>>>>> so compare text file from abby and tesserct >>>>>>>> if the result is highly different you need to improve image quality >>>>>>>> or improve LSTM >>>>>>>> >>>>>>>> if the result of tesseract is good so you need to enhance the PDF >>>>>>>> generation module >>>>>>>> >>>>>>>> بتاريخ الأربعاء، 25 مارس، 2020 7:04:14 ص UTC+2، كتب Teo: >>>>>>>>> >>>>>>>>> The quality is already very good, but is lower than abby >>>>>>>>> finereader. In attachment there is a comparison between abby and >>>>>>>>> gimagereader ocr, and you can see the difference. How we can >>>>>>>>> improve it? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a43268d6-b716-4ecb-b591-affeaa859896%40googlegroups.com.

