Hi, I have investigated on trying to detect language automatically. I referred to these links. Thank you, Merlijin. https://archive.org/services/docs/api/ocr.html#autonomous-mode https://git.archive.org/www/tesseract/-/blob/master/main.py#L757
So in my analysis, it used OSD of tesseract engine to detect layout and script. After detect script, it detects languages on the script. So I tried to use OSD engine mode based on textfairy which is Android OCR app based on tesseract 4.1.1. But it doesn't work and I can't make sure how I can use OSD engine mode in Android. I set 'osd' as language option string and used osd.traindata and set 'OEM_OSD_ONLY' as engine mode. But it doesn't work. Hope anyone can help you to use OSD engine mode in Android. Thank you. Best, Charles. On Monday, March 22, 2021 at 10:28:38 AM UTC+8 Charles Cho wrote: > Hi, Merlijn. > > Thanks for your kind response. > > Regarding autonomous mode, I'm trying to find such module for Android. > But I found nothing. I will try more. > > >I am not sure what you're finding on google play store, but I have found > >there to be no limitation to the amount of languages that can be used > >during OCR. Keep in mind that using more languages will slow down the > >OCR process. > It's textfairy, open source app. > https://play.google.com/store/apps/details?id=com.renard.ocr > > Your response is really helpful. > > Best, > Charles. > On Sunday, March 21, 2021 at 8:29:13 AM UTC+8 Merlijn Wajer wrote: > >> Hi, >> >> On 19/03/2021 10:11, Charles Cho wrote: >> > Hello, >> > I'm working on a ocr android app based on tesseract. >> > I want to add feature that detects language automatically and recognize >> > at least 2 languages at once. >> > I have investigated on that for a while so I know that I have to >> specify >> > language for tesseract. >> > Then how can I implement auto detection of language? >> >> Not exactly a mobile use case, but you can read how the Internet Archive >> does this (I coined it "autonomous mode", where the software just >> figures out the scripts and languages): >> >> https://archive.org/services/docs/api/ocr.html#autonomous-mode >> >> And the code is available, here (I plan to split out the archive.org >> specific code from the python code that invokes Tesseract and performs >> heuristics like script detection): >> >> https://git.archive.org/www/tesseract/-/blob/master/main.py#L757 >> >> the tl;dr is to first perform script detection, and use the detected >> script to OCR the page - then use language detection libraries to guess >> the languages on the page. >> >> > And tesseract on google play store can recognize 3 languages at once. >> > Is it maximum? >> >> I am not sure what you're finding on google play store, but I have found >> there to be no limitation to the amount of languages that can be used >> during OCR. Keep in mind that using more languages will slow down the >> OCR process. >> >> > Any help and advice would be really appreciated. >> >> Hope this helps. >> >> Cheers, >> Merlijn >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f05cb3fa-b7da-491f-930b-127e5784abc5n%40googlegroups.com.