See https://github.com/tesseract-ocr/tessdoc/blob/master/examples/OSD_example.cc
//Get OSD - new code
int orient_deg;
float orient_conf;
const char* script_name;
float script_conf;
api->DetectOrientationScript(&orient_deg, &orient_conf, &script_name,
&script_conf);
printf("************\n Orientation in degrees: %d\n Orientation
confidence: %.2f\n"
" Script: %s\n Script confidence: %.2f\n",
orient_deg, orient_conf,
script_name, script_conf);
On Thursday, March 25, 2021 at 2:11:42 PM UTC+5:30 [email protected]
wrote:
> Hi,
>
> I have investigated on trying to detect language automatically.
> I referred to these links. Thank you, Merlijin.
> https://archive.org/services/docs/api/ocr.html#autonomous-mode
> https://git.archive.org/www/tesseract/-/blob/master/main.py#L757
>
> So in my analysis, it used OSD of tesseract engine to detect layout and
> script.
> After detect script, it detects languages on the script.
>
> So I tried to use OSD engine mode based on textfairy which is Android OCR
> app based on tesseract 4.1.1.
> But it doesn't work and I can't make sure how I can use OSD engine mode in
> Android.
> I set 'osd' as language option string and used osd.traindata and set
> 'OEM_OSD_ONLY' as engine mode.
> But it doesn't work.
>
> Hope anyone can help you to use OSD engine mode in Android.
>
> Thank you.
> Best,
> Charles.
>
> On Monday, March 22, 2021 at 10:28:38 AM UTC+8 Charles Cho wrote:
>
>> Hi, Merlijn.
>>
>> Thanks for your kind response.
>>
>> Regarding autonomous mode, I'm trying to find such module for Android.
>> But I found nothing. I will try more.
>>
>> >I am not sure what you're finding on google play store, but I have found
>> >there to be no limitation to the amount of languages that can be used
>> >during OCR. Keep in mind that using more languages will slow down the
>> >OCR process.
>> It's textfairy, open source app.
>> https://play.google.com/store/apps/details?id=com.renard.ocr
>>
>> Your response is really helpful.
>>
>> Best,
>> Charles.
>> On Sunday, March 21, 2021 at 8:29:13 AM UTC+8 Merlijn Wajer wrote:
>>
>>> Hi,
>>>
>>> On 19/03/2021 10:11, Charles Cho wrote:
>>> > Hello,
>>> > I'm working on a ocr android app based on tesseract.
>>> > I want to add feature that detects language automatically and
>>> recognize
>>> > at least 2 languages at once.
>>> > I have investigated on that for a while so I know that I have to
>>> specify
>>> > language for tesseract.
>>> > Then how can I implement auto detection of language?
>>>
>>> Not exactly a mobile use case, but you can read how the Internet Archive
>>> does this (I coined it "autonomous mode", where the software just
>>> figures out the scripts and languages):
>>>
>>> https://archive.org/services/docs/api/ocr.html#autonomous-mode
>>>
>>> And the code is available, here (I plan to split out the archive.org
>>> specific code from the python code that invokes Tesseract and performs
>>> heuristics like script detection):
>>>
>>> https://git.archive.org/www/tesseract/-/blob/master/main.py#L757
>>>
>>> the tl;dr is to first perform script detection, and use the detected
>>> script to OCR the page - then use language detection libraries to guess
>>> the languages on the page.
>>>
>>> > And tesseract on google play store can recognize 3 languages at once.
>>> > Is it maximum?
>>>
>>> I am not sure what you're finding on google play store, but I have found
>>> there to be no limitation to the amount of languages that can be used
>>> during OCR. Keep in mind that using more languages will slow down the
>>> OCR process.
>>>
>>> > Any help and advice would be really appreciated.
>>>
>>> Hope this helps.
>>>
>>> Cheers,
>>> Merlijn
>>>
>>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/20bdef8f-a543-420d-aba8-a9260fe3a28bn%40googlegroups.com.