[tesseract-ocr] Re: How to improve ocr reader?

Teo Thu, 26 Mar 2020 04:10:35 -0700

Thanks for your help. how can i get the coordinates, and how do i check if 
they are correct?


Il giorno mercoledì 25 marzo 2020 10:41:07 UTC+1, Essam Zaky ha scritto:
>
> You need now to check the coordinates returned from tesseract ,use hocr 
> output and check if words coordinates are returned correctly if yes so it 
> is a bug in pdf generation
>
> if the coordinates are wrong it's bug in tesseract 
>
> for me i used before library called itextsharp to generate searchable pdf 
> , the library  ported from itext java library , it gives good pdf output
>
>
> بتاريخ الأربعاء، 25 مارس، 2020 11:25:46 ص UTC+2، كتب Teo:
>>
>> Ok I think that it's  a pdf generation module, because the txt is almost 
>> the same with the exception of some "the" which tesseract sees as "thè".
>>
>> Il giorno mercoledì 25 marzo 2020 07:25:11 UTC+1, Essam Zaky ha scritto:
>>>
>>> You need to know which to improve tesserct  engine or PDF generation
>>>
>>> so compare text file from abby and tesserct 
>>> if the result is highly different you need to improve image quality or 
>>> improve LSTM 
>>>
>>> if the result of tesseract is good so you need to enhance the PDF 
>>> generation module
>>>
>>> بتاريخ الأربعاء، 25 مارس، 2020 7:04:14 ص UTC+2، كتب Teo:
>>>>
>>>> The quality is already very good, but is lower than abby finereader. In 
>>>> attachment there is a comparison between abby and gimagereader ocr, and 
>>>> you 
>>>> can see the difference. How we can improve it?
>>>>
>>>>
>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b3293cc3-4766-4020-85b5-de6ad282aa6c%40googlegroups.com.

[tesseract-ocr] Re: How to improve ocr reader?

Reply via email to