Quote/Cytat - Nick White <[email protected]> (Fri 27 Jun 2014
06:49:46 PM CEST):
On Fri, Jun 27, 2014 at 01:48:52AM -0700, thinker wrote:
reading image with multiple language (arabic and english) by using -l
ara+eng option gives garbage output.
There are currently a couple of bugs with combining Arabic and
English together, so it isn't working. I'd recommend you add any
extra information you have to those bugs, to help the issues be
resolved sooner:
https://code.google.com/p/tesseract-ocr/issues/detail?id=899
https://code.google.com/p/tesseract-ocr/issues/detail?id=1220
In the meantime you can try to merge the results of the separate runs
for each language. You will find hocr-merge at
https://bitbucket.org/jwilk/marasca-wbl
in misc/xhocr /
Best regards
Janusz
--
Prof. dr hab. Janusz S. Bień - Uniwersytet Warszawski (Katedra
Lingwistyki Formalnej)
Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
[email protected], [email protected], http://fleksem.klf.uw.edu.pl/~jsbien/
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/20140627185423.15573r63mttk58e7%40mail.mimuw.edu.pl.
For more options, visit https://groups.google.com/d/optout.