did you ever get any help with this? your image is not of good quality, but i could extract a lot of the text without much effort.
<https://lh3.googleusercontent.com/-0oA3bwv5UXw/WfvK0251NLI/AAAAAAAAASI/8FikoAChEfMebGLkiXPRt__p-LKfVPDCACLcBGAs/s1600/aunque.png> On Friday, October 4, 2013 at 6:53:29 AM UTC-4, adrian company wrote: > > <https://lh6.googleusercontent.com/-gFboadFkrOM/Uk6dz2WI7rI/AAAAAAAAATs/2aWZloPX_T0/s1600/texto.jpg> > Hi I' ve been trying to fix the code and now I read some text but the text > readed is not correct. When I' m trying to read the image, "texto.jpg" the > output text is something like that: > II" n llfi ' IIIi"'"“MM“ufimn"w»“W,;H»m““wu%M%“m»%.IlI!!!!!| "" "" ''"umm > """"'"""“ "" "" '''''|IImmw '!!!ll::...i!!"!""' W*WM‘WM > > Anybody knows what is the reason of that? > This is the image to read: > > > El viernes, 4 de octubre de 2013 08:59:34 UTC+2, adrian company escribió: >> >> Hi all, >> I am using tesseract engine to detect text from an image. I process the >> image to binarize it and then extract text from it, but some errors are >> displayed when I execute the program. >> Does anyone know what I am doing wrong? I paste the code and errors >> displayed in the execution. >> >> if (waitKey(10) >= 0){ >> >> // leer imagen >> Mat imagen = imread( >> "/home/adrian/workspace/OCR/matricula2.jpg"/*,CV_LOAD_IMAGE_GRAYSCALE*/); >> imshow("imagen",imagen); >> >> >> //procesamos imagen redimensionada: (filtramos, pasamos >> a escala grises, binarizamos) >> medianBlur(imagen,imagen, 3); >> cvtColor(imagen,imagen,CV_BGR2GRAY); >> threshold(imagen,imagen,umbral, umbral_max,3); >> >> // inicializamos motor OCR tesseract >> putenv("TESSDATA_PREFIX=/usr/local/share/"); >> setlocale(LC_NUMERIC, "C"); >> tesseract::TessBaseAPI api; >> printf("\nTesseract-ocr version: %s--------\t",api. >> Version()); //version de tesseract >> printf("Leptonica version: %s\n", getLeptonicaVersion >> ()); //version de leptonica >> printf( >> "___________________________________________________________________________\n" >> ); >> >> if (api.Init(NULL, "spa")) { //idioma >> spanish >> fprintf( stderr, " ¡No se pudo inicializar >> tesseract! \n" ); >> exit(1); >> } >> >> api.SetPageSegMode(tesseract::PSM_AUTO); >> api.SetVariable("tessedit_char_whitelist", >> "ABCDEFGHIJKLMNOPQRSTUVWXYZ.0123456789"); >> >> api.SetImage(imagen.data, imagen.size().width,imagen. >> size().height, imagen.channels(), imagen.step1()); >> >> // region de interes (ROI), p.ej. regiones que contengan texto >> Rect textROI(0,0,imagen.cols,imagen.rows);//imagen completa >> >> >> // recognize text >> api.TesseractRect( imagen.data, 0,imagen.step1(), textROI.x, >> textROI.y,textROI.width, textROI.height); >> >> char *texto = new char[200]; >> texto = api.GetUTF8Text(); >> // remove "newline" >> string t1(texto); >> t1.erase( remove(t1.begin(), t1.end(), '\n'), t1.end() ); >> >> // print found text >> printf("TEXTO LEIDO: \n"); >> printf( "%s",t1.c_str() ); >> >> >> >> // draw rectangle image >> >> rectangle(imagen, textROI, Scalar(0, 0, 255), 2, 8, 0); >> >> imwrite("/home/adrian/workspace/OCR/procesadas/binaria.jpg", >> imagen); >> >> imshow("binarizada",imagen); >> >> delete [] texto; >> // destroy tesseract OCR engine >> api.Clear(); >> api.End(); >> } >> >> and the errors displayed are: >> >> Tesseract-ocr version: 3.02.02-------- Leptonica version: >> leptonica-1.69 >> >> ___________________________________________________________________________ >> Error in pixReduceRankBinary2: hs must be at least 2 >> Error in pixDilateBrick: pixs not defined >> Error in pixExpandReplicate: pixs not defined >> Error in pixAnd: pixs1 not defined >> Error in pixDilateBrick: pixs not defined >> Error in pixExpandReplicate: pixs not defined >> Error in pixAnd: pixs2 not defined >> TEXTO LEIDO: >> >> >> >> >> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/72cee4b1-fc76-442a-bba3-70f31292de55%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

