With using "api.GetComponentImages(RIL_SYMBOL, true, NULL, NULL);" function, I take every pieces of text with a loop using "api.GetThresholdedImage()" and save to disk with "pixWrite". (Image is at below)
<https://lh4.googleusercontent.com/--6j09JgYq0Q/Uqg6dprZU0I/AAAAAAAAAFE/bUPRhdYPHJM/s1600/binary.png> When I give same image as input to the tesseract api, it doesn't recognize it. But if I add only 1 pixel wide line to any of sides, it recognize it well. (Image is at below) <https://lh5.googleusercontent.com/-TBaomOmhywQ/Uqg6nXUrv9I/AAAAAAAAAFM/eHjMstd0UKk/s1600/binaryOK.png> (one more pixel line on right side) api.Init("","eng", OEM_TESSERACT_ONLY); PIX *pixs; pixs = pixRead("C:/tesseract/sampleImg/tr/binary.png"); api.SetImage(pixs); text_out = api.GetUTF8Text(); I did this because on same image, I get good results with PSM_SINGLE_BLOCK, PSM_SINGLE_WORD, PSM_SINGLE_LINE but with PSM_SINGLE_CHAR, results are terrible. To check why this result occur, I saved every image part to the disk and send them seperately to the tesseract. If I add 1 pixel to every sides as below, resuts are pretty good; box->x = box->x - 1; box->y = box->y - 1; box->w = box->w + 2; box->h = box->h + 2; Sample result ; ,<https://lh3.googleusercontent.com/-gcjCyirWWu0/Uqg3N09PlsI/AAAAAAAAAE4/fG_-acCmoNk/s1600/ikili.png> First image (with wider boxes) recognized very well (result are on left-top, question marks is for newline character) but on second image, every letters recognized wrong or even couldn't recognized. For this sample I add 1 pixel to every side, but I couldn't check on every image. How can I solve this issue? void tesseractWithOpenCVComponentImages() { initTesseract(); Mat img; img = imread(imagePath,CV_LOAD_IMAGE_GRAYSCALE); if(!img.data) { cout << "Resim yüklenemedi!"; cin.get(); exit(1); } api.SetImage(img.data, img.cols, img.rows, 1, img.step1()); text_out = api.GetUTF8Text(); PageSegMode segMode = PSM_SINGLE_CHAR; PageIteratorLevel level = RIL_SYMBOL; api.SetPageSegMode(segMode); Boxa *boxes = api.GetComponentImages(level, true, NULL, NULL); for(int i=0; i< boxes->n; i++) { Box *box = boxaGetBox(boxes, i, L_CLONE); // This will give good result -> box->x = box->x - 1; box->y = box->y - 1; box->w = box->w + 2; box->h = box->h + 2; // <- This will give good result api.TesseractRect(img.data, 1, img.step1(),box->x, box->y, box->w,box ->h); char *outText = api.GetUTF8Text(); printf("%d. : %s", i+1, outText); } } -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

