read this document
https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage

the following command can return the coordinates

tesseract testing/eurotext.png testing/eurotext-eng -l eng hocr


hocr contain the word as a text and coordinate
you can open the image in any image editor such as MSpaint and check the 
returned coordinates represent the word in images

Best Regards

بتاريخ الخميس، 26 مارس، 2020 1:10:22 م UTC+2، كتب Teo:
>
> Thanks for your help. how can i get the coordinates, and how do i check if 
> they are correct?
>
> Il giorno mercoledì 25 marzo 2020 10:41:07 UTC+1, Essam Zaky ha scritto:
>>
>> You need now to check the coordinates returned from tesseract ,use hocr 
>> output and check if words coordinates are returned correctly if yes so it 
>> is a bug in pdf generation
>>
>> if the coordinates are wrong it's bug in tesseract 
>>
>> for me i used before library called itextsharp to generate searchable pdf 
>> , the library  ported from itext java library , it gives good pdf output
>>
>>
>> بتاريخ الأربعاء، 25 مارس، 2020 11:25:46 ص UTC+2، كتب Teo:
>>>
>>> Ok I think that it's  a pdf generation module, because the txt is almost 
>>> the same with the exception of some "the" which tesseract sees as "thè".
>>>
>>> Il giorno mercoledì 25 marzo 2020 07:25:11 UTC+1, Essam Zaky ha scritto:
>>>>
>>>> You need to know which to improve tesserct  engine or PDF generation
>>>>
>>>> so compare text file from abby and tesserct 
>>>> if the result is highly different you need to improve image quality or 
>>>> improve LSTM 
>>>>
>>>> if the result of tesseract is good so you need to enhance the PDF 
>>>> generation module
>>>>
>>>> بتاريخ الأربعاء، 25 مارس، 2020 7:04:14 ص UTC+2، كتب Teo:
>>>>>
>>>>> The quality is already very good, but is lower than abby finereader. 
>>>>> In attachment there is a comparison between abby and gimagereader ocr, 
>>>>> and 
>>>>> you can see the difference. How we can improve it?
>>>>>
>>>>>
>>>>>
>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/cae9a132-fb12-4512-bc3f-79c2d948a615%40googlegroups.com.

Reply via email to