Re: [tesseract-ocr] The pictures captured by the camera did not identify well after preprocessing

vis li Thu, 16 Sep 2021 01:31:12 -0700

Thanks for your answer，
The text of the picture is a test case ，the reason why i use this test case 
is that the  actual text is produced by stm32  microcontroller .
it produce text like "E2PROM ADDR6".Text itself may be some abnormal text 
language ...
'zth' is the library i  have trained with Microsoft Yahei Standard font . I 
have used eng library
which is Official word library file downloaded from the corresponding 
version of tesseract .
It was not as accurate as I trained myself



在2021年9月16日星期四 UTC+8 下午4:04:36<Lorenzo Blz> 写道：

> Hi Vli,
> I think you should test this on something similar to your actual text, not 
> on the alphabet or random strings.  With real text you are not going to see 
> () or <> that may be mistaken for a O.
>
> The sequence of characters may influence the output, in other words try it 
> on real text. You can also blacklist the characters you do not need.
>
> To be honest, the result does not seem bad to me. Special characters are 
> the most difficult ones.
>
> Also this font is not easy to read, look at the M letter for example. If 
> you can, change the font or try to capture the image at higher resolution 
> before cleaning it.
>
> What language is zth? This looks like latin text, did you try eng?
>
>
> Lorenzo
>
> Il giorno gio 16 set 2021 alle ore 07:59 vis li <[email protected]> ha 
> scritto:
>
>> Tesseract Version：4.1.1
>> Platform:Window10
>>
>> <https://user-images.githubusercontent.com/51877381/133545017-12e2b715-be45-4198-8035-9838c5375ea9.png>[image:
>>  
>> testa.png]
>>
>> <https://user-images.githubusercontent.com/51877381/133545026-66cdd822-6885-4561-aa8c-d13496573a62.png>[image:
>>  
>> testb.png]
>> Page.getText():
>>
>> ACBEDFHGIKJLNHOP
>> RQSUTV¥WYaZbdcef
>>
>> 1ppp000012121010
>> &*(O+-,.:; O=%/
>>
>> like this，the result has some faults.
>> I know that my image has some defects,but how can i improve this 
>> situation?
>> I have done the binarization of the picture,and try to improve dpi to 300
>> Because the pictures captured by the camera,I am worried if they can meet 
>> the standard for web pictures
>>
>> I have used LTSM mode ,and my Identified word library file is trained by 
>> LTSM and Microsoft Yahei Standard font
>>
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/96ce0479-bc22-477d-9d5b-a6408509121fn%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/96ce0479-bc22-477d-9d5b-a6408509121fn%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/765451c2-7440-4a5c-acf5-41ce4e42daa8n%40googlegroups.com.

Re: [tesseract-ocr] The pictures captured by the camera did not identify well after preprocessing

Reply via email to