did you ever get any help with this?  your image is not of good quality, 
but i could extract a lot of the text without much effort.

<https://lh3.googleusercontent.com/-0oA3bwv5UXw/WfvK0251NLI/AAAAAAAAASI/8FikoAChEfMebGLkiXPRt__p-LKfVPDCACLcBGAs/s1600/aunque.png>


On Friday, October 4, 2013 at 6:53:29 AM UTC-4, adrian company wrote:

>
> <https://lh6.googleusercontent.com/-gFboadFkrOM/Uk6dz2WI7rI/AAAAAAAAATs/2aWZloPX_T0/s1600/texto.jpg>
> Hi I' ve been trying to fix the code and now I read some text but the text 
> readed is not correct. When I' m trying to read the image, "texto.jpg" the 
> output text is something like that:
> II" n llfi  ' IIIi"'"“MM“ufimn"w»“W,;H»m““wu%M%“m»%.IlI!!!!!| "" "" ''"umm 
> """"'"""“ "" "" '''''|IImmw '!!!ll::...i!!"!""' W*WM‘WM
>
> Anybody  knows what is the reason of that?
> This is the image to read:
>
>
> El viernes, 4 de octubre de 2013 08:59:34 UTC+2, adrian company escribió:
>>
>> Hi all, 
>> I am using tesseract engine to detect text from an image. I process the 
>> image to binarize it and then extract text from it, but some errors are 
>> displayed when I execute the program. 
>> Does anyone know what I am doing wrong? I paste the code and errors 
>> displayed in the execution.
>>
>> if (waitKey(10) >= 0){
>>
>>                 // leer imagen
>>                 Mat imagen = imread(
>> "/home/adrian/workspace/OCR/matricula2.jpg"/*,CV_LOAD_IMAGE_GRAYSCALE*/);
>>                 imshow("imagen",imagen);
>>              
>>
>>                 //procesamos imagen redimensionada: (filtramos, pasamos 
>> a escala grises, binarizamos)
>>                  medianBlur(imagen,imagen, 3);
>>                  cvtColor(imagen,imagen,CV_BGR2GRAY);
>>                  threshold(imagen,imagen,umbral, umbral_max,3);
>>
>>                 // inicializamos motor OCR tesseract
>>                    putenv("TESSDATA_PREFIX=/usr/local/share/");
>>                    setlocale(LC_NUMERIC, "C");
>>                    tesseract::TessBaseAPI api;
>>                    printf("\nTesseract-ocr version: %s--------\t",api.
>> Version()); //version de tesseract
>>                    printf("Leptonica version: %s\n", getLeptonicaVersion
>> ());        //version de leptonica
>>                    printf(
>> "___________________________________________________________________________\n"
>> );
>>
>>                    if (api.Init(NULL, "spa")) {                 //idioma 
>> spanish
>>                       fprintf( stderr, " ¡No se pudo inicializar 
>> tesseract! \n" );
>>                        exit(1);
>>                    }
>>                  
>>                     api.SetPageSegMode(tesseract::PSM_AUTO);
>>                     api.SetVariable("tessedit_char_whitelist", 
>> "ABCDEFGHIJKLMNOPQRSTUVWXYZ.0123456789");
>>                                     
>>                     api.SetImage(imagen.data, imagen.size().width,imagen.
>> size().height, imagen.channels(), imagen.step1());
>>
>>            // region de interes (ROI), p.ej. regiones que contengan texto
>>             Rect textROI(0,0,imagen.cols,imagen.rows);//imagen completa
>>
>>
>>             // recognize text
>>             api.TesseractRect( imagen.data, 0,imagen.step1(), textROI.x, 
>> textROI.y,textROI.width, textROI.height);
>>
>>             char *texto = new char[200];
>>             texto = api.GetUTF8Text();
>>             // remove "newline"
>>             string t1(texto);
>>             t1.erase( remove(t1.begin(), t1.end(), '\n'), t1.end() );
>>
>>             // print found text
>>             printf("TEXTO LEIDO: \n");
>>             printf( "%s",t1.c_str() );
>>
>>         
>>
>>             // draw rectangle image
>>            
>>             rectangle(imagen, textROI, Scalar(0, 0, 255), 2, 8, 0);
>>            
>>             imwrite("/home/adrian/workspace/OCR/procesadas/binaria.jpg",
>> imagen);
>>            
>>             imshow("binarizada",imagen);
>>
>>             delete [] texto;
>>             // destroy tesseract OCR engine
>>             api.Clear();
>>             api.End();
>>             }
>>
>> and the errors displayed are:
>>
>> Tesseract-ocr version: 3.02.02--------    Leptonica version: 
>> leptonica-1.69
>>
>> ___________________________________________________________________________
>> Error in pixReduceRankBinary2: hs must be at least 2
>> Error in pixDilateBrick: pixs not defined
>> Error in pixExpandReplicate: pixs not defined
>> Error in pixAnd: pixs1 not defined
>> Error in pixDilateBrick: pixs not defined
>> Error in pixExpandReplicate: pixs not defined
>> Error in pixAnd: pixs2 not defined
>> TEXTO LEIDO:
>>
>>
>>
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/72cee4b1-fc76-442a-bba3-70f31292de55%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to