Hi, adding a additional border is know trick how to fix OCR results especially for single char mode. As far as I remember nobody find other way how to solve this issue.
Zdenko On Wed, Dec 11, 2013 at 12:26 PM, Faruk Terzioğlu <[email protected]>wrote: > With using "api.GetComponentImages(RIL_SYMBOL, true, NULL, NULL);" > function, I take every pieces of text with a loop using > "api.GetThresholdedImage()" and save to disk with "pixWrite". (Image is at > below) > > > <https://lh4.googleusercontent.com/--6j09JgYq0Q/Uqg6dprZU0I/AAAAAAAAAFE/bUPRhdYPHJM/s1600/binary.png> > When I give same image as input to the tesseract api, it doesn't recognize > it. But if I add only 1 pixel wide line to any of sides, it recognize it > well. (Image is at below) > > > <https://lh5.googleusercontent.com/-TBaomOmhywQ/Uqg6nXUrv9I/AAAAAAAAAFM/eHjMstd0UKk/s1600/binaryOK.png> > (one more pixel line on right side) > > > api.Init("","eng", OEM_TESSERACT_ONLY); > PIX *pixs; > pixs = pixRead("C:/tesseract/sampleImg/tr/binary.png"); > api.SetImage(pixs); > text_out = api.GetUTF8Text(); > > I did this because on same image, I get good results with > PSM_SINGLE_BLOCK, PSM_SINGLE_WORD, PSM_SINGLE_LINE but with > PSM_SINGLE_CHAR, results are terrible. > To check why this result occur, I saved every image part to the disk and > send them seperately to the tesseract. > > If I add 1 pixel to every sides as below, resuts are pretty good; > box->x = box->x - 1; > box->y = box->y - 1; > box->w = box->w + 2; > box->h = box->h + 2; > > Sample result ; > > ,<https://lh3.googleusercontent.com/-gcjCyirWWu0/Uqg3N09PlsI/AAAAAAAAAE4/fG_-acCmoNk/s1600/ikili.png> > First image (with wider boxes) recognized very well (result are on > left-top, question marks is for newline character) but on second image, > every letters recognized wrong or even couldn't recognized. > > For this sample I add 1 pixel to every side, but I couldn't check on every > image. > How can I solve this issue? > > void tesseractWithOpenCVComponentImages() > { > initTesseract(); > > Mat img; > img = imread(imagePath,CV_LOAD_IMAGE_GRAYSCALE); > if(!img.data) { cout << "Resim yüklenemedi!"; cin.get(); exit(1); } > > api.SetImage(img.data, img.cols, img.rows, 1, img.step1()); > text_out = api.GetUTF8Text(); > > PageSegMode segMode = PSM_SINGLE_CHAR; > PageIteratorLevel level = RIL_SYMBOL; > > api.SetPageSegMode(segMode); > > Boxa *boxes = api.GetComponentImages(level, true, NULL, NULL); > for(int i=0; i< boxes->n; i++) > { > Box *box = boxaGetBox(boxes, i, L_CLONE); > // This will give good result -> > box->x = box->x - 1; > box->y = box->y - 1; > box->w = box->w + 2; > box->h = box->h + 2; > // <- This will give good result > > api.TesseractRect(img.data, 1, img.step1(),box->x, box->y, box->w,box > ->h); > char *outText = api.GetUTF8Text(); > > printf("%d. : %s", i+1, outText); > } > } > > > > > -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > --- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

