Have you solved the problem?
在2021年3月26日星期五 UTC+8 09:55:53<[email protected]> 写道:
> Hi,
>
> >>>The OSD module does not detect language - it detect script, as you also
> >>>noted earlier:
> It detects language by using OSD in tesseract and tesseract also provides
> DetectOrientationScript function.
>
> api.Init("/Users/renard/devel/textfairy/tessdata", "osd",
> tesseract::OcrEngineMode::OEM_DEFAULT);
> api.SetPageSegMode(tesseract::PageSegMode::PSM_OSD_ONLY);
> api.SetImage(pix);
> api.DetectOrientationScript(&orient_deg, &orient_conf, &script_name,
> &script_conf);
>
> After this, script_name will get language name and script_conf will get
> confidence value.
> As I tested several languages, scipt_name gets following values.
> English -> 'Latin'
> French->'Latin'
> German->'Latin'
> Chinese_Sim -> 'Han'
> Chinese_Tra -> 'Han'
> Korean -> 'Korean'
> Japanese -> 'Japanese'
> Russian -> 'Cyrillic'
>
> So the problem is that I want to distinguish Latin languages exactly and I
> want to detects several languages once from an image.
>
> Thanks.
> Best,
> Charles.
> On Friday, March 26, 2021 at 2:33:26 AM UTC+8 Merlijn Wajer wrote:
>
>> Hi,
>>
>> On 25/03/2021 19:04, Charles Cho wrote:
>> > Hi.
>> >
>> > Thank you very much for your kind help, shree.
>> > I tried to detect script by your help and it worked. Great.
>> >
>> > I have some questions.
>> > 1. If the image contains texts of different languages in a page, is
>> there
>> > any way to detect all of the languages? Now it detects only one
>> language.
>> > 2. It detects English, German, French as 'Latin'. So how can I
>> distinguish
>> > the languages exactly?
>>
>> The OSD module does not detect language - it detect script, as you also
>> noted earlier:
>>
>> >>> So in my analysis, it used OSD of tesseract engine to detect layout
>> and
>> >>> script.
>> >>> After detect script, it detects languages on the script.
>>
>> What's missing is performing OCR using just the script - and then
>> analysing the corpus to detect the language.
>>
>> You could use something like this: https://github.com/saffsd/langid.c
>>
>> Regards,
>> Merlijn
>>
>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/04a1038c-3720-4524-aa95-dc851804563bn%40googlegroups.com.