Hi, 

>>>The OSD module does not detect language - it detect script, as you also
>>>noted earlier:
It detects language by using OSD in tesseract and tesseract also provides 
DetectOrientationScript function.

api.Init("/Users/renard/devel/textfairy/tessdata", "osd", 
tesseract::OcrEngineMode::OEM_DEFAULT);
api.SetPageSegMode(tesseract::PageSegMode::PSM_OSD_ONLY);
api.SetImage(pix);
api.DetectOrientationScript(&orient_deg, &orient_conf, &script_name, 
&script_conf);  

After this, script_name will get language name and script_conf will get 
confidence value.
As I tested several languages, scipt_name gets following values.
English -> 'Latin'
French->'Latin'
German->'Latin'
Chinese_Sim -> 'Han'
Chinese_Tra -> 'Han'
Korean -> 'Korean'
Japanese -> 'Japanese'
Russian -> 'Cyrillic'

So the problem is that I want to distinguish Latin languages exactly and I 
want to  detects several languages once from an image.

Thanks.
Best,
Charles.
On Friday, March 26, 2021 at 2:33:26 AM UTC+8 Merlijn Wajer wrote:

> Hi, 
>
> On 25/03/2021 19:04, Charles Cho wrote: 
> > Hi. 
> > 
> > Thank you very much for your kind help, shree. 
> > I tried to detect script by your help and it worked. Great. 
> > 
> > I have some questions. 
> > 1. If the image contains texts of different languages in a page, is 
> there 
> > any way to detect all of the languages? Now it detects only one 
> language. 
> > 2. It detects English, German, French as 'Latin'. So how can I 
> distinguish 
> > the languages exactly? 
>
> The OSD module does not detect language - it detect script, as you also 
> noted earlier: 
>
> >>> So in my analysis, it used OSD of tesseract engine to detect layout 
> and 
> >>> script. 
> >>> After detect script, it detects languages on the script. 
>
> What's missing is performing OCR using just the script - and then 
> analysing the corpus to detect the language. 
>
> You could use something like this: https://github.com/saffsd/langid.c 
>
> Regards, 
> Merlijn 
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/7deebf13-4422-458d-a81f-a081e740d549n%40googlegroups.com.

Reply via email to