Your input image quality needs to be improved. Also test with --oem 1 alone.
Please test with https://github.com/tesseract-ocr/tesseract/blob/master/testing/hebtypo.jpg and see if you get similar results. for hocr, just adding hocr to the command line should work - as long as you have the hocr config file in your tessdata directory. ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Jun 20, 2017 at 1:05 PM, לאה למד <[email protected]> wrote: > > hi > * Attached line from the original image > > command *tesseract file.tiff output --oem 2 -l heb --psm 6* > resulte *"אומדן / שווי ההתקשרות: 6 ₪ לפני מע"מ. ₪"* > > command *tesseract file.tiff output --oem 0 -l heb --psm 6* > resulte *"אןמדן ושווי ההתקשרות: 16,656 ₪ לפניימע"מ. ₪”"* > > So for people that don't read hebrew i can tell that extract the sentence > are more good with the lstm but for a unknown reason the extract number > absolutely wrong > any ideas? > > and not connect question , how i can do "hocr" in the new tesseract? > thank you > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/bfa31f55-a8b4-43f5-9049-417cf0f20229% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/bfa31f55-a8b4-43f5-9049-417cf0f20229%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXbuF54fZw80k4y4T1EtunuHy_-Z%2Ba-cCiJeTXbfsP%2BBg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

