Hi, >>>The OSD module does not detect language - it detect script, as you also >>>noted earlier: It detects language by using OSD in tesseract and tesseract also provides DetectOrientationScript function.
api.Init("/Users/renard/devel/textfairy/tessdata", "osd", tesseract::OcrEngineMode::OEM_DEFAULT); api.SetPageSegMode(tesseract::PageSegMode::PSM_OSD_ONLY); api.SetImage(pix); api.DetectOrientationScript(&orient_deg, &orient_conf, &script_name, &script_conf); After this, script_name will get language name and script_conf will get confidence value. As I tested several languages, scipt_name gets following values. English -> 'Latin' French->'Latin' German->'Latin' Chinese_Sim -> 'Han' Chinese_Tra -> 'Han' Korean -> 'Korean' Japanese -> 'Japanese' Russian -> 'Cyrillic' So the problem is that I want to distinguish Latin languages exactly and I want to detects several languages once from an image. Thanks. Best, Charles. On Friday, March 26, 2021 at 2:33:26 AM UTC+8 Merlijn Wajer wrote: > Hi, > > On 25/03/2021 19:04, Charles Cho wrote: > > Hi. > > > > Thank you very much for your kind help, shree. > > I tried to detect script by your help and it worked. Great. > > > > I have some questions. > > 1. If the image contains texts of different languages in a page, is > there > > any way to detect all of the languages? Now it detects only one > language. > > 2. It detects English, German, French as 'Latin'. So how can I > distinguish > > the languages exactly? > > The OSD module does not detect language - it detect script, as you also > noted earlier: > > >>> So in my analysis, it used OSD of tesseract engine to detect layout > and > >>> script. > >>> After detect script, it detects languages on the script. > > What's missing is performing OCR using just the script - and then > analysing the corpus to detect the language. > > You could use something like this: https://github.com/saffsd/langid.c > > Regards, > Merlijn > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7deebf13-4422-458d-a81f-a081e740d549n%40googlegroups.com.