Seam like you should put this question to the author of language data "ARYuanB5-MD"...
Zdenko ne 15. 10. 2023 o 15:44 'Danny Wilson' via tesseract-ocr < [email protected]> napísal(a): > Running tesseract on a single Chinese character "對" outputs the character, > but also the text "xlz". > > Command line: > tesseract sub0089w.png debugOut -l ARYuanB5-MD --dpi 72 --psm 6 -c > preserve_interword_spaces=1 > > The output is two lines: > xlz > 對 > > It used to output "sMz" but after retraining several times with the > specific font in use, it now outputs "xlz". > > Why? > > I've attached the image file in question... > > [image: sub0089w.png] > > (Searching the source code, the file universalambigs.h has a line " xlZ le > 1" which is similar, but not exact to the errant text I'm finding) > > Thank you. > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/76ed2f78-e10f-4b9f-8d61-30f4b0f333dbn%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/76ed2f78-e10f-4b9f-8d61-30f4b0f333dbn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8y1_y%3Diw8uCEw5Z3km%3DApZ5%2BFFudjqMKV_HO9QJ41FNyw%40mail.gmail.com.

