Thanks for your answer, The text of the picture is a test case ,the reason why i use this test case is that the actual text is produced by stm32 microcontroller . it produce text like "E2PROM ADDR6".Text itself may be some abnormal text language ... 'zth' is the library i have trained with Microsoft Yahei Standard font . I have used eng library which is Official word library file downloaded from the corresponding version of tesseract . It was not as accurate as I trained myself
在2021年9月16日星期四 UTC+8 下午4:04:36<Lorenzo Blz> 写道: > Hi Vli, > I think you should test this on something similar to your actual text, not > on the alphabet or random strings. With real text you are not going to see > () or <> that may be mistaken for a O. > > The sequence of characters may influence the output, in other words try it > on real text. You can also blacklist the characters you do not need. > > To be honest, the result does not seem bad to me. Special characters are > the most difficult ones. > > Also this font is not easy to read, look at the M letter for example. If > you can, change the font or try to capture the image at higher resolution > before cleaning it. > > What language is zth? This looks like latin text, did you try eng? > > > Lorenzo > > Il giorno gio 16 set 2021 alle ore 07:59 vis li <[email protected]> ha > scritto: > >> Tesseract Version:4.1.1 >> Platform:Window10 >> >> <https://user-images.githubusercontent.com/51877381/133545017-12e2b715-be45-4198-8035-9838c5375ea9.png>[image: >> >> testa.png] >> >> <https://user-images.githubusercontent.com/51877381/133545026-66cdd822-6885-4561-aa8c-d13496573a62.png>[image: >> >> testb.png] >> Page.getText(): >> >> ACBEDFHGIKJLNHOP >> RQSUTV¥WYaZbdcef >> >> 1ppp000012121010 >> &*(O+-,.:; O=%/ >> >> like this,the result has some faults. >> I know that my image has some defects,but how can i improve this >> situation? >> I have done the binarization of the picture,and try to improve dpi to 300 >> Because the pictures captured by the camera,I am worried if they can meet >> the standard for web pictures >> >> I have used LTSM mode ,and my Identified word library file is trained by >> LTSM and Microsoft Yahei Standard font >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/96ce0479-bc22-477d-9d5b-a6408509121fn%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/96ce0479-bc22-477d-9d5b-a6408509121fn%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/765451c2-7440-4a5c-acf5-41ce4e42daa8n%40googlegroups.com.

