I see you have a publication on document image processing, therefore I suppose you're in the know of many techniques.
These images require a bit different approaches. In general, in both cases Tess requires some help with layout analysis and table border or frame removal. 4.png ------- - Binarize. I think Otsu would suffice. - Remove table borders. Use either CC analysis (filter by CC size, nesting level, etc.), or Hough transform to detect long straight lines (if table borders touch characters). - Isolate rotated text at the right. Tess can't recognize such text. Unrotate and OCR separately. Probably also would need upscaling, say by 3x. - Isolate regions with dense text and OCR separately one by one. Tess is bad at recognition of sparse text, let alone so different in size. 82.png --------- - Binarize. Otsu. - Remove the frame. I suppose the easiest is filter CCs by pixel count. - Upper word. Isolate and OCR separately. Needs prior blurring (to make characters more "fleshy") and upscaling (to provide more stroke details to Tess). Instead of blurring you may use dilation. - Lower word. Isolate and OCR separately. May require erosion (as Tess's stock traineddata might not work well for such bold font). Locating dense text regions, vertical text and so on can be done by NN chain analysis. It seems you have used all the above mentioned methods as I read in your article's abstract. Tesseract is no miracle, you have to do many things manually. All above is easier to do by programming but might be done by means of ImageMagick/shell scripts also. Best regards, Dmitri Silaev www.CustomOCR.com On Thu, May 28, 2015 at 2:47 PM, supriya Das <[email protected]> wrote: > Hello Dmitri Siaev, > Thanks for your response. Please tell me the complex processing logic. > Thanks in advance. > > On Thursday, 28 May 2015 15:59:22 UTC+5:30, Dmitri Silaev wrote: >> >> You won't get any improvement just by changing a few params. A more >> complex processing is required. Let me know if you're interested in more >> details. >> >> Best regards, >> Dmitri Silaev >> www.CustomOCR.com >> >> >> >> >> >> On Thu, May 28, 2015 at 8:50 AM, supriya Das <[email protected]> wrote: >> >>> Hello Everybody, >>> >>> I am not getting proper output for couple of image. What kind of >>> parameter should be set for getting proper output? >>> and is it possible to set SetPageSegMode with multiple enum at a >>> time? Some problem images are as follow. Thanks in Advance. >>> >>> >>> In the bellow images i am not getting any kind of output. i also tried >>> to change ppi to 300 but not getting result. >>> >>> >>> <https://lh3.googleusercontent.com/-XlFRIZfDN-k/VWasN7JC1FI/AAAAAAAAAPU/y77aOoveOhk/s1600/4.png> >>> >>> >>> <https://lh3.googleusercontent.com/-jW3aDb_4lZE/VWargKvFZsI/AAAAAAAAAPM/Y26kenYq93U/s1600/82.png> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/7431b25c-47ae-46d1-af90-e2ec80a7b7ca%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/7431b25c-47ae-46d1-af90-e2ec80a7b7ca%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/8e57aa4c-3a7c-4eb4-a377-8a0700093f32%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/8e57aa4c-3a7c-4eb4-a377-8a0700093f32%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAKzLxFPdiTAHD4Q-MKcBzqxmsVyNjAjhrws_gy2K_HxVGbvrzw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

