[tesseract-ocr] How does tesseract work with multiple languages text?

Layne Wang Thu, 07 Jun 2018 01:36:02 -0700

Hi, 

I'm working on segmenting different languages from an image, so I wonder 
how tesseract choose the output character when we give multiple languages 
in the command line.


So far, what I know: 

   - The lstm model in traineddata for different languages are different, I 
   cannot combine the traineddata easily.
   - The sequence of the language command matters. For example, -eng+fra 
   and -fra+eng will give different results. And the first language passed is 
   set as primary, which affects the output spacing.

I would like to know:

   - How does tesseract choose the output character when it is in different 
   languages? Is it based on the confidence score? And how does the "primary" 
   play a role in generating the output?

Thank you!
Layne

ps. I posted the same content early today but could not see my post showing 
in the group. Appreciate someone could tell me the reason.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/2d5e257e-3ebc-4d47-bbc4-2ba40bd5f35d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] How does tesseract work with multiple languages text?

Reply via email to