[tesseract-ocr] Re: How to improve ocr reader?

Teo Thu, 26 Mar 2020 13:55:21 -0700

Ok coordinates seem correct.

Il giorno giovedì 26 marzo 2020 19:13:52 UTC+1, Essam Zaky ha scritto:
>
> read this document
> https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage
>
> the following command can return the coordinates
>
> tesseract testing/eurotext.png testing/eurotext-eng -l eng hocr
>
>
> hocr contain the word as a text and coordinate
> you can open the image in any image editor such as MSpaint and check the 
> returned coordinates represent the word in images
>
> Best Regards
>
> بتاريخ الخميس، 26 مارس، 2020 1:10:22 م UTC+2، كتب Teo:
>>
>> Thanks for your help. how can i get the coordinates, and how do i check 
>> if they are correct?
>>
>> Il giorno mercoledì 25 marzo 2020 10:41:07 UTC+1, Essam Zaky ha scritto:
>>>
>>> You need now to check the coordinates returned from tesseract ,use hocr 
>>> output and check if words coordinates are returned correctly if yes so it 
>>> is a bug in pdf generation
>>>
>>> if the coordinates are wrong it's bug in tesseract 
>>>
>>> for me i used before library called itextsharp to generate searchable 
>>> pdf , the library  ported from itext java library , it gives good pdf output
>>>
>>>
>>> بتاريخ الأربعاء، 25 مارس، 2020 11:25:46 ص UTC+2، كتب Teo:
>>>>
>>>> Ok I think that it's  a pdf generation module, because the txt is 
>>>> almost the same with the exception of some "the" which tesseract sees as 
>>>> "thè".
>>>>
>>>> Il giorno mercoledì 25 marzo 2020 07:25:11 UTC+1, Essam Zaky ha scritto:
>>>>>
>>>>> You need to know which to improve tesserct  engine or PDF generation
>>>>>
>>>>> so compare text file from abby and tesserct 
>>>>> if the result is highly different you need to improve image quality or 
>>>>> improve LSTM 
>>>>>
>>>>> if the result of tesseract is good so you need to enhance the PDF 
>>>>> generation module
>>>>>
>>>>> بتاريخ الأربعاء، 25 مارس، 2020 7:04:14 ص UTC+2، كتب Teo:
>>>>>>
>>>>>> The quality is already very good, but is lower than abby finereader. 
>>>>>> In attachment there is a comparison between abby and gimagereader ocr, 
>>>>>> and 
>>>>>> you can see the difference. How we can improve it?
>>>>>>
>>>>>>
>>>>>>
>>>>>>


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/6e127b74-c57f-4b79-94bd-e766d254f2cd%40googlegroups.com.

   
 
  
  
  
   
  
   
 
  
  

The main topics of theoretical computer science are taught in most computer
Science and engineering curricula, but are not presented as a foundation for
omputer studies. Most courses—and their reference textbooks—are highly
sed in their choice of topics. Very often they overemphasize traditional areas
such as formal languages and automata—and pay little or no attention to
yer important topics—such as formal semantics or computational complexity.
The organization of this book results from our strongly held belief that
oretical computer science should be viewed as the cornerstone of computer
ence and engineering curricula. Computer specialists, in their everyday life,
must be able to translate actual problems into abstractions based on the use of
ormal models, to manipulate such formal descriptions, and to reason about their
_ Properties in a rigorous way. This very special attitude differentiates the 
com-
puter specialist from most other technical professionals.
For these reasons, we suggest that an exposure to theoretical computer
science topics should be given in the early stage of computer science education,
particularly at the undergraduate level. Theoretical topics should not be 
viewed as
options that can be added late in the curricula. Rather, they must be viewed as
ee Wa Viieel oc

[tesseract-ocr] Re: How to improve ocr reader?

Reply via email to