I have tried to add margins to the lines, but it did not make the results better.
Also tried to use other psm values (11, 12 ..) it was not also enhancing the output. It looks like the (hocr) parameter, is enforcing the psm to be as a page. any Ideas how to imporve or enhance the results. On Friday, June 15, 2018 at 2:42:00 PM UTC+2, [email protected] wrote: > > Dear All, > > In the project that I am currently working in, I have a pure text line > cropped from an document image. > > As a next step, I need to recognize the text using and at the same time, I > need to get the words coordinates. > > To get that coordinates I am passing the hocr parameters to the command > line and assign the page segmentation mode to 7 (line). > > tesseract file.png out.txt --psm 7 hocr. > > However, the output is really bad because by passing these parameters, the > line will be conisders as a page and some words will not be detected at the > output. > > Is there another way to get the word coordinate of that line? > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/4f861275-6e2d-47ed-bc98-ceb31f6c9fe0%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

