Ok thanks a lot. Il giorno sabato 28 marzo 2020 19:04:25 UTC+1, Essam Zaky ha scritto: > > Yes with the same command the result attached > > > بتاريخ السبت، 28 مارس، 2020 7:55:05 م UTC+2، كتب Teo: >> >> With the same coomand? >> tesseract pho.png pho-eng -l eng pdf >> >> >> >> Il giorno sabato 28 marzo 2020 18:48:17 UTC+1, Essam Zaky ha scritto: >>> >>> It works fine in my machine >>> It seems it's problem in your pdf viewer >>> i used Adobe PDF reader V9.0 >>> >>> there are some pdf readers fail to read serachable pdf , try to check >>> another reader >>> >>> Best Regards >>> Essam >>> >>> بتاريخ السبت، 28 مارس، 2020 7:34:59 م UTC+2، كتب Teo: >>>> >>>> >>>> Ok >>>> Il giorno sabato 28 marzo 2020 18:32:26 UTC+1, Essam Zaky ha scritto: >>>>> >>>>> PLease attach the original image to check on my machine >>>>> >>>>> بتاريخ السبت، 28 مارس، 2020 7:24:07 م UTC+2، كتب Teo: >>>>>> >>>>>> Thanks for the reply. >>>>>> I just opened an issue on github/Tesseract. Then I tried to create an >>>>>> pdf only with tesseract and without gimagereader with: >>>>>> tesseract pho.png pho-eng -l eng pdf >>>>>> but this is the result... >>>>>> >>>>>> >>>>>> Il giorno venerdì 27 marzo 2020 03:13:40 UTC+1, Essam Zaky ha scritto: >>>>>>> >>>>>>> So I guess the error in PDF generation module >>>>>>> you have one of the following option >>>>>>> -try to enhance the bug by your self >>>>>>> -raise an issue in Tesseract issues , but check first that the issue >>>>>>> is not exist in list of issues >>>>>>> -Use other extrenal library to create searchable pdf depending on >>>>>>> hocr >>>>>>> >>>>>>> before tesseract add feature of generating pdf i used library called >>>>>>> itextsharp to generate the pdf and the result was very good for me >>>>>>> >>>>>>> بتاريخ الخميس، 26 مارس، 2020 10:54:50 م UTC+2، كتب Teo: >>>>>>>> >>>>>>>> Ok coordinates seem correct. >>>>>>>> >>>>>>>> Il giorno giovedì 26 marzo 2020 19:13:52 UTC+1, Essam Zaky ha >>>>>>>> scritto: >>>>>>>>> >>>>>>>>> read this document >>>>>>>>> https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage >>>>>>>>> >>>>>>>>> the following command can return the coordinates >>>>>>>>> >>>>>>>>> tesseract testing/eurotext.png testing/eurotext-eng -l eng hocr >>>>>>>>> >>>>>>>>> >>>>>>>>> hocr contain the word as a text and coordinate >>>>>>>>> you can open the image in any image editor such as MSpaint and >>>>>>>>> check the returned coordinates represent the word in images >>>>>>>>> >>>>>>>>> Best Regards >>>>>>>>> >>>>>>>>> بتاريخ الخميس، 26 مارس، 2020 1:10:22 م UTC+2، كتب Teo: >>>>>>>>>> >>>>>>>>>> Thanks for your help. how can i get the coordinates, and how do i >>>>>>>>>> check if they are correct? >>>>>>>>>> >>>>>>>>>> Il giorno mercoledì 25 marzo 2020 10:41:07 UTC+1, Essam Zaky ha >>>>>>>>>> scritto: >>>>>>>>>>> >>>>>>>>>>> You need now to check the coordinates returned from tesseract >>>>>>>>>>> ,use hocr output and check if words coordinates are returned >>>>>>>>>>> correctly if >>>>>>>>>>> yes so it is a bug in pdf generation >>>>>>>>>>> >>>>>>>>>>> if the coordinates are wrong it's bug in tesseract >>>>>>>>>>> >>>>>>>>>>> for me i used before library called itextsharp to generate >>>>>>>>>>> searchable pdf , the library ported from itext java library , it >>>>>>>>>>> gives >>>>>>>>>>> good pdf output >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> بتاريخ الأربعاء، 25 مارس، 2020 11:25:46 ص UTC+2، كتب Teo: >>>>>>>>>>>> >>>>>>>>>>>> Ok I think that it's a pdf generation module, because the txt >>>>>>>>>>>> is almost the same with the exception of some "the" which >>>>>>>>>>>> tesseract sees as >>>>>>>>>>>> "thè". >>>>>>>>>>>> >>>>>>>>>>>> Il giorno mercoledì 25 marzo 2020 07:25:11 UTC+1, Essam Zaky ha >>>>>>>>>>>> scritto: >>>>>>>>>>>>> >>>>>>>>>>>>> You need to know which to improve tesserct engine or PDF >>>>>>>>>>>>> generation >>>>>>>>>>>>> >>>>>>>>>>>>> so compare text file from abby and tesserct >>>>>>>>>>>>> if the result is highly different you need to improve image >>>>>>>>>>>>> quality or improve LSTM >>>>>>>>>>>>> >>>>>>>>>>>>> if the result of tesseract is good so you need to enhance the >>>>>>>>>>>>> PDF generation module >>>>>>>>>>>>> >>>>>>>>>>>>> بتاريخ الأربعاء، 25 مارس، 2020 7:04:14 ص UTC+2، كتب Teo: >>>>>>>>>>>>>> >>>>>>>>>>>>>> The quality is already very good, but is lower than abby >>>>>>>>>>>>>> finereader. In attachment there is a comparison between abby and >>>>>>>>>>>>>> gimagereader ocr, and you can see the difference. How we can >>>>>>>>>>>>>> improve it? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5116c498-15c8-4090-b125-1c30579c54f2%40googlegroups.com.

