Quote/Cytat - Nick White <[email protected]> (Fri 27 Jun 2014 06:49:46 PM CEST):

On Fri, Jun 27, 2014 at 01:48:52AM -0700, thinker wrote:
reading image  with multiple language (arabic and english)  by using  -l
ara+eng option gives garbage output.

There are currently a couple of bugs with combining Arabic and
English together, so it isn't working. I'd recommend you add any
extra information you have to those bugs, to help the issues be
resolved sooner:

https://code.google.com/p/tesseract-ocr/issues/detail?id=899
https://code.google.com/p/tesseract-ocr/issues/detail?id=1220

In the meantime you can try to merge the results of the separate runs for each language. You will find hocr-merge at

https://bitbucket.org/jwilk/marasca-wbl

in  misc/xhocr /

Best regards

Janusz
--
Prof. dr hab. Janusz S. Bień - Uniwersytet Warszawski (Katedra Lingwistyki Formalnej)
Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
[email protected], [email protected], http://fleksem.klf.uw.edu.pl/~jsbien/

--
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/20140627185423.15573r63mttk58e7%40mail.mimuw.edu.pl.
For more options, visit https://groups.google.com/d/optout.

Reply via email to