Ok thanks, I'll keep this. Il giorno sabato 28 marzo 2020 19:24:12 UTC+1, Lorenzo Blz ha scritto: > > If you'd like to improve the OCR accuracy too a simple contrast > enhancement (with a simple S shaped curve) and a little sharpening helps > with the left border. See the attached file. > > > > Lorenzo > > Il giorno sab 28 mar 2020 alle ore 19:04 Essam Zaky <[email protected] > <javascript:>> ha scritto: > >> Yes with the same command the result attached >> >> >> بتاريخ السبت، 28 مارس، 2020 7:55:05 م UTC+2، كتب Teo: >>> >>> With the same coomand? >>> tesseract pho.png pho-eng -l eng pdf >>> >>> >>> >>> Il giorno sabato 28 marzo 2020 18:48:17 UTC+1, Essam Zaky ha scritto: >>>> >>>> It works fine in my machine >>>> It seems it's problem in your pdf viewer >>>> i used Adobe PDF reader V9.0 >>>> >>>> there are some pdf readers fail to read serachable pdf , try to check >>>> another reader >>>> >>>> Best Regards >>>> Essam >>>> >>>> بتاريخ السبت، 28 مارس، 2020 7:34:59 م UTC+2، كتب Teo: >>>>> >>>>> >>>>> Ok >>>>> Il giorno sabato 28 marzo 2020 18:32:26 UTC+1, Essam Zaky ha scritto: >>>>>> >>>>>> PLease attach the original image to check on my machine >>>>>> >>>>>> بتاريخ السبت، 28 مارس، 2020 7:24:07 م UTC+2، كتب Teo: >>>>>>> >>>>>>> Thanks for the reply. >>>>>>> I just opened an issue on github/Tesseract. Then I tried to create >>>>>>> an pdf only with tesseract and without gimagereader with: >>>>>>> tesseract pho.png pho-eng -l eng pdf >>>>>>> but this is the result... >>>>>>> >>>>>>> >>>>>>> Il giorno venerdì 27 marzo 2020 03:13:40 UTC+1, Essam Zaky ha >>>>>>> scritto: >>>>>>>> >>>>>>>> So I guess the error in PDF generation module >>>>>>>> you have one of the following option >>>>>>>> -try to enhance the bug by your self >>>>>>>> -raise an issue in Tesseract issues , but check first that the >>>>>>>> issue is not exist in list of issues >>>>>>>> -Use other extrenal library to create searchable pdf depending on >>>>>>>> hocr >>>>>>>> >>>>>>>> before tesseract add feature of generating pdf i used library >>>>>>>> called itextsharp to generate the pdf and the result was very good >>>>>>>> for me >>>>>>>> >>>>>>>> بتاريخ الخميس، 26 مارس، 2020 10:54:50 م UTC+2، كتب Teo: >>>>>>>>> >>>>>>>>> Ok coordinates seem correct. >>>>>>>>> >>>>>>>>> Il giorno giovedì 26 marzo 2020 19:13:52 UTC+1, Essam Zaky ha >>>>>>>>> scritto: >>>>>>>>>> >>>>>>>>>> read this document >>>>>>>>>> https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage >>>>>>>>>> >>>>>>>>>> the following command can return the coordinates >>>>>>>>>> >>>>>>>>>> tesseract testing/eurotext.png testing/eurotext-eng -l eng hocr >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> hocr contain the word as a text and coordinate >>>>>>>>>> you can open the image in any image editor such as MSpaint and >>>>>>>>>> check the returned coordinates represent the word in images >>>>>>>>>> >>>>>>>>>> Best Regards >>>>>>>>>> >>>>>>>>>> بتاريخ الخميس، 26 مارس، 2020 1:10:22 م UTC+2، كتب Teo: >>>>>>>>>>> >>>>>>>>>>> Thanks for your help. how can i get the coordinates, and how do >>>>>>>>>>> i check if they are correct? >>>>>>>>>>> >>>>>>>>>>> Il giorno mercoledì 25 marzo 2020 10:41:07 UTC+1, Essam Zaky ha >>>>>>>>>>> scritto: >>>>>>>>>>>> >>>>>>>>>>>> You need now to check the coordinates returned from tesseract >>>>>>>>>>>> ,use hocr output and check if words coordinates are returned >>>>>>>>>>>> correctly if >>>>>>>>>>>> yes so it is a bug in pdf generation >>>>>>>>>>>> >>>>>>>>>>>> if the coordinates are wrong it's bug in tesseract >>>>>>>>>>>> >>>>>>>>>>>> for me i used before library called itextsharp to generate >>>>>>>>>>>> searchable pdf , the library ported from itext java library , it >>>>>>>>>>>> gives >>>>>>>>>>>> good pdf output >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> بتاريخ الأربعاء، 25 مارس، 2020 11:25:46 ص UTC+2، كتب Teo: >>>>>>>>>>>>> >>>>>>>>>>>>> Ok I think that it's a pdf generation module, because the txt >>>>>>>>>>>>> is almost the same with the exception of some "the" which >>>>>>>>>>>>> tesseract sees as >>>>>>>>>>>>> "thè". >>>>>>>>>>>>> >>>>>>>>>>>>> Il giorno mercoledì 25 marzo 2020 07:25:11 UTC+1, Essam Zaky >>>>>>>>>>>>> ha scritto: >>>>>>>>>>>>>> >>>>>>>>>>>>>> You need to know which to improve tesserct engine or PDF >>>>>>>>>>>>>> generation >>>>>>>>>>>>>> >>>>>>>>>>>>>> so compare text file from abby and tesserct >>>>>>>>>>>>>> if the result is highly different you need to improve image >>>>>>>>>>>>>> quality or improve LSTM >>>>>>>>>>>>>> >>>>>>>>>>>>>> if the result of tesseract is good so you need to enhance the >>>>>>>>>>>>>> PDF generation module >>>>>>>>>>>>>> >>>>>>>>>>>>>> بتاريخ الأربعاء، 25 مارس، 2020 7:04:14 ص UTC+2، كتب Teo: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The quality is already very good, but is lower than abby >>>>>>>>>>>>>>> finereader. In attachment there is a comparison between abby >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>> gimagereader ocr, and you can see the difference. How we >>>>>>>>>>>>>>> can improve it? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/7b8148d7-6075-4bed-9edb-99480001204b%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/7b8148d7-6075-4bed-9edb-99480001204b%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/97569b14-3f65-4147-9012-adbdcef7263a%40googlegroups.com.

