Re: [tesseract-ocr] read multi-language ( arabic and english) image

Janusz S. Bien Fri, 27 Jun 2014 09:55:20 -0700

Quote/Cytat - Nick White <[email protected]> (Fri 27 Jun 201406:49:46 PM CEST):

On Fri, Jun 27, 2014 at 01:48:52AM -0700, thinker wrote:

reading image  with multiple language (arabic and english)  by using  -l
ara+eng option gives garbage output.


There are currently a couple of bugs with combining Arabic and
English together, so it isn't working. I'd recommend you add any
extra information you have to those bugs, to help the issues be
resolved sooner:

https://code.google.com/p/tesseract-ocr/issues/detail?id=899
https://code.google.com/p/tesseract-ocr/issues/detail?id=1220

In the meantime you can try to merge the results of the separate runsfor each language. You will find hocr-merge at


https://bitbucket.org/jwilk/marasca-wbl

in  misc/xhocr /

Best regards

Janusz
--

Prof. dr hab. Janusz S. Bień - Uniwersytet Warszawski (KatedraLingwistyki Formalnej)

Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
[email protected], [email protected], http://fleksem.klf.uw.edu.pl/~jsbien/

--
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/20140627185423.15573r63mttk58e7%40mail.mimuw.edu.pl.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] read multi-language ( arabic and english) image

Reply via email to