Hi, On 25/03/2021 19:04, Charles Cho wrote: > Hi. > > Thank you very much for your kind help, shree. > I tried to detect script by your help and it worked. Great. > > I have some questions. > 1. If the image contains texts of different languages in a page, is there > any way to detect all of the languages? Now it detects only one language. > 2. It detects English, German, French as 'Latin'. So how can I distinguish > the languages exactly?
The OSD module does not detect language - it detect script, as you also noted earlier: >>> So in my analysis, it used OSD of tesseract engine to detect layout and >>> script. >>> After detect script, it detects languages on the script. What's missing is performing OCR using just the script - and then analysing the corpus to detect the language. You could use something like this: https://github.com/saffsd/langid.c Regards, Merlijn -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/35b6efd2-109f-06a3-6af9-7c8619a52dc3%40archive.org.