Re: [tesseract-ocr] tesseract unable to detect characters in simple two-word image

Shree Devi Kumar Sat, 04 Jan 2020 22:53:26 -0800

try --psm 6

ubuntu@tesseract-ocr:~/TEST$ tesseract lao.jpg -
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 197
Empty page!!
Estimating resolution as 197
Empty page!!
ubuntu@tesseract-ocr:~/TEST$ tesseract lao.jpg - --dpi 300
Empty page!!
Empty page!!
ubuntu@tesseract-ocr:~/TEST$ tesseract lao.jpg - --dpi 300 --psm 6
LAO 7° f CAUD 8°
ubuntu@tesseract-ocr:~/TEST$ tesseract lao.jpg - --psm 6
Warning: Invalid resolution 0 dpi. Using 70 instead.
LAO 7° f CAUD 8°
ubuntu@tesseract-ocr:~/TEST$


On Sun, Jan 5, 2020 at 11:59 AM Rory MacQueen <[email protected]>
wrote:

> I'm having trouble getting tesseract to recognize any characters in the
> following image:
>
>
> [image: tessinput]
> <https://user-images.githubusercontent.com/1205705/71773281-8e405700-2f28-11ea-86f6-2acfc6c09b46.jpg>
>
>
> When I run tesseract from the command line on this image, I get "Empty
> page!!" - that is, no results - returned. Based on my reading of the Improving
> Quality <https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality> 
> section
> of the wiki, I thought that the issue might be that the words in this image
> are not dictionary words. With that in mind, I have tried both disabling
> the tesseract dictionaries altogether (using the load_system_dawg and
> load_freq_dawg config flags) as well as augmenting the existing dictionary
> with these additional words (LAO and CAUD). Neither of those approaches
> worked. I have tried tesseract versions 3, 4, and have built version 5 from
> source on a Mac computer. All have given the same result.
>
>
> Curiously, if I type the exact words from that image into a word processor
> and take a screenshot, it works: the resulting image is readable by
> tesseract. It correctly parses each character. Here is that image:
>
> [image: Screen Shot 2020-01-04 at 7 01 11 PM]
> <https://user-images.githubusercontent.com/1205705/71773337-7d441580-2f29-11ea-96ab-5d4d58c77ce2.png>
>
> The only difference between the two images is that the first one is of a
> slightly lower resolution/quality. Am I then to believe that tesseract is
> unable to recognize characters in a slightly inferior quality image like
> that? Is there anything I can do to improve that image quality? Is there
> something else I'm missing?
>
>
> Thanks in advance.
>
>
> -Rory
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/980a7d52-9343-46a5-a417-f6b01cb711da%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/980a7d52-9343-46a5-a417-f6b01cb711da%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>


-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWuABYcs%2BK%2Bm0Cdn4qdT-HN3PeGjWw7SxVAJqL81yRxoQ%40mail.gmail.com.

Re: [tesseract-ocr] tesseract unable to detect characters in simple two-word image

Reply via email to