Seam like you should put this question to the author of language data
"ARYuanB5-MD"...

Zdenko


ne 15. 10. 2023 o 15:44 'Danny Wilson' via tesseract-ocr <
[email protected]> napísal(a):

> Running tesseract on a single Chinese character "對" outputs the character,
> but also the text "xlz".
>
> Command line:
> tesseract sub0089w.png debugOut -l ARYuanB5-MD --dpi 72 --psm 6 -c
> preserve_interword_spaces=1
>
> The output is two lines:
> xlz
> 對
>
> It used to output "sMz"  but after retraining several times with the
> specific font in use, it now outputs "xlz".
>
> Why?
>
> I've attached the image file in question...
>
> [image: sub0089w.png]
>
> (Searching the source code, the file universalambigs.h has a line " xlZ le
> 1" which is similar, but not exact to the errant text I'm finding)
>
> Thank you.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/76ed2f78-e10f-4b9f-8d61-30f4b0f333dbn%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/76ed2f78-e10f-4b9f-8d61-30f4b0f333dbn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8y1_y%3Diw8uCEw5Z3km%3DApZ5%2BFFudjqMKV_HO9QJ41FNyw%40mail.gmail.com.

Reply via email to