Can you provide any information on how this works? At what level can languages mingle? For example, could each wod be of a different language? Or is it on a sentence level or on a paragraph level? Is there a way to influence this? For example, if I know that a document is of only a single language, I just don't know which one, is there a way to specify that? Does the result contain information on which language matched?
Best regards, Marcus On 8 Mrz., 08:53, zdenko podobny <[email protected]> wrote: > On Wed, Mar 7, 2012 at 11:51 PM, Falke <[email protected]> wrote: > > I did search this group but found only old posts regarding multiple > > languages (regarding 2.0), but, looking forward to the new features in > > 3.01... > > > I am assuming it's still impossible, even in 3.01, to recognize a > > mixture of languages (distinct alphabets), per scan. If my assumption > > is correct, then, the next best thing would/could be to combine > > multiple traineddata files into one superset... > > > this feature will be/is available in 3.02 version[1] (already in svn). > > [1]http://groups.google.com/group/tesseract-ocr/msg/29413aef63ee5977 > > > > > But is that even feasible?? > > > Any other solutions for multilingual (multi-alphabetic) documents? > > > (ABBYY does it -- why can't we?? :-)) > > > TIA > > > -- > > You received this message because you are subscribed to the Google > > Groups "tesseract-ocr" group. > > To post to this group, send email to [email protected] > > To unsubscribe from this group, send email to > > [email protected] > > For more options, visit this group at > >http://groups.google.com/group/tesseract-ocr?hl=en- Zitierten Text > >ausblenden - > > - Zitierten Text anzeigen - -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

