Re: [tesseract-ocr] bad result on tesseract(4.0) with lstm

2017-06-20 Thread ShreeDevi Kumar
Your input image quality needs to be improved.

Also test with --oem 1 alone.

Please test with
https://github.com/tesseract-ocr/tesseract/blob/master/testing/hebtypo.jpg
and see if you get similar results.

for hocr, just adding hocr to the command line should work - as long as you
have the hocr config file in your tessdata directory.

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Jun 20, 2017 at 1:05 PM, לאה למד  wrote:

>
> hi
> * Attached line from the original image
>
>  command  *tesseract file.tiff output --oem 2 -l heb --psm 6*
> resulte *"אומדן / שווי ההתקשרות: 6 ₪ לפני מע"מ. ₪"*
>
>  command  *tesseract file.tiff output --oem 0 -l heb --psm 6*
> resulte *"אןמדן ושווי ההתקשרות: 16,656 ₪ לפניימע"מ. ₪”"*
>
> So for people that don't read hebrew i can tell that extract the sentence
> are more good with the lstm but for a unknown reason the extract number
> absolutely wrong
> any ideas?
>
> and not connect question , how i can do "hocr" in  the new tesseract?
>  thank you
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/bfa31f55-a8b4-43f5-9049-417cf0f20229%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXbuF54fZw80k4y4T1EtunuHy_-Z%2Ba-cCiJeTXbfsP%2BBg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] bad result on tesseract(4.0) with lstm

2017-06-20 Thread לאה למד

hi
* Attached line from the original image 

 command  *tesseract file.tiff output --oem 2 -l heb --psm 6*
resulte *"אומדן / שווי ההתקשרות: 6 ₪ לפני מע"מ. ₪"*

 command  *tesseract file.tiff output --oem 0 -l heb --psm 6*
resulte *"אןמדן ושווי ההתקשרות: 16,656 ₪ לפניימע"מ. ₪”"*

So for people that don't read hebrew i can tell that extract the sentence 
are more good with the lstm but for a unknown reason the extract number 
absolutely wrong
any ideas?

and not connect question , how i can do "hocr" in  the new tesseract?
 thank you

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/bfa31f55-a8b4-43f5-9049-417cf0f20229%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.