Re: tesseract does not recognize text correctly or does not recognize.

adrian company Mon, 09 Dec 2013 22:42:07 -0800

Hi Zdenko,
I've tried to use the code you posted here for using leptonica, and that 
gives me an error saying something about min, max specificacion
(Error: Illegal min or max specification!
signal_termination_handler:Error:Signal_termination_handler called:Code 
5002)


I've changed the OCR.SetRectangle and the same error displayed, I've tried 
also with another image and the same.

El lunes, 9 de diciembre de 2013 22:02:13 UTC+1, zdenop escribió:
>
>
>    1. Instead of function listing it is better to provide small test 
>    case. It save time to testers...
>    2. Skip not "relevant" code (e.g. if you are testing tesseract api, 
>    open image with leptonica function and not with opencv...)
>    3. You need to fix perspective of image first, so you have some border 
>    around text. See I did it in gimp, but maybe you can do it in opencv 
> too... 
>
>
>
> Zdenko
>
>
> On Mon, Dec 9, 2013 at 1:15 PM, adrian company <[email protected]<javascript:>
> > wrote:
>
>> Hi Nick,
>> I've took a look at api/tesseractmain.cpp as you recommend me, but I 
>> cannot find anything wrong, I think. Anyway, I could post my program here 
>> and try to guess what is going on with your help. 
>> This is my method: 
>> ___________________________________________________________________
>> void recognizeChar(Mat imagen){
>>
>>    /*INITIALIZE (TESSERACT)*/
>>     putenv("TESSDATA_PREFIX=/usr/local/share/");
>>     setlocale(LC_NUMERIC, "C");
>>     tesseract::TessBaseAPI OCR;
>>
>>    if (OCR.Init(NULL, "spa")){
>>         fprintf( stderr, "cannot could initialize tesseract.... \n" );
>>         exit(1);
>>     }
>>     /*CONFIGURING*/
>>     OCR.SetPageSegMode(tesseract::PSM_SINGLE_LINE);
>>     api.SetVariable("tessedit_char_whitelist", 
>> "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 ");//lista blanca
>>     api.SetVariable("tessedit_char_blacklist" , 
>> "<>abcdefghijklmnopqtrstuvwxyz./!¡$%&?¿,;+-#");//lista negra
>>     OCR.SetImage(imagen.data, imagen.size().width, imagen.size().height, 
>> imagen.channels(), imagen.step1());
>>     OCR.TesseractRect(imagen.data, 0, imagen.step1(), 0, 0, imagen.cols, 
>> imagen.rows);
>>
>>     /*GETTING READED TEXT*/
>>     char* texto = OCR.GetUTF8Text();
>>     string t1=texto;
>>    t1.erase( remove(t1.begin(), t1.end(), '\n'), t1.end() );
>>     cout << "TEXTO: "<<t1.c_str() <<endl;
>> }
>> _______________________________________________________________________
>> Thank you all.
>>
>> El martes, 3 de diciembre de 2013 11:29:58 UTC+1, Nick White escribió:
>>
>>> Hi Adrian, 
>>>
>>> Well then your C++ program must be wrong in some way. The command 
>>> line version doesn't do anything special, it just uses the API like 
>>> anything else. Take a look at api/tesseractmain.cpp to check how 
>>> your API usage differs, to find your bug. 
>>>
>>> Nick 
>>>
>>> On Tue, Dec 03, 2013 at 01:16:40AM -0800, adrian company wrote: 
>>> > Hi Sventech, 
>>> > I've tested the image with the command line version and I get the same 
>>> result 
>>> > as you. But when I use my own software in C++ I cannot obtain the same 
>>> result, 
>>> > simply get nothing. Currently I am using PSM_SINGLE_LINE, but I've 
>>> said before 
>>> > I've tried all the page seg modes. 
>>> > I don't know what is wrong. I've reinstalled tesseract and do the 
>>> same. 
>>> > 
>>> > 
>>> > El martes, 3 de diciembre de 2013 07:29:11 UTC+1, adrian company 
>>> escribió: 
>>> > 
>>> >     And about the page seg I've tried with all the page seg but I 
>>> still get 
>>> >     anything. 
>>> > 
>>> >     El lunes, 2 de diciembre de 2013 16:13:17 UTC+1, sventech 
>>> escribió: 
>>> > 
>>> >         I get 
>>> >         V! 2\"03ENl 
>>> >         so you could postprocess that kind of thing to get better 
>>> results -- 
>>> >         you need to eliminate the black border for best results. You 
>>> may need 
>>> >         to remove noise. What page seg mode are you using? Make sure 
>>> you test 
>>> >         with the command line version before you try your own. Also, 
>>> I'm using 
>>> >         the latest version 3.02.02 
>>> >         --Sven 
>>> > 
>>> > 
>>> > 
>>> >         On Mon, Dec 2, 2013 at 6:18 AM, adrian company <
>>> [email protected]> 
>>> >         wrote: 
>>> > 
>>> >             Hi again, I've tried to deskew the first image and pass it 
>>> to 
>>> >             tesseract greater, but I have the same result, the numbers 
>>> and 
>>> >             letters are not recognized by tesseract. I post an image 
>>> where you 
>>> >             can see how is my image now. 
>>> >             Any idea??? 
>>> >             Thanks in advance again. 
>>> > 
>>> > 
>>> > 
>>> > 
>>> > 
>>> >             El jueves, 31 de octubre de 2013 07:22:53 UTC+1, adrian 
>>> company 
>>> >             escribió: 
>>> > 
>>> >                 Thanks Sventech, I'll try to deskew the first, i'm 
>>> using opencv 
>>> >                 to prepare the image so I cannot use any program to 
>>> prepare it. 
>>> >                 I've tried to rotate the image and pass it to 
>>> tesseract with 
>>> >                 text in horizontal but tesseract outputs the same. I 
>>> will also 
>>> >                 try to pass it to in png format and I will see the 
>>> result. 
>>> >                   
>>> > 
>>> >                 On Wednesday, October 30, 2013 3:21:58 PM UTC+1, 
>>> sventech 
>>> >                 wrote: 
>>> > 
>>> >                     In the first image you need to deskew it. There 
>>> are free 
>>> >                     programs for preparing the image, The second image 
>>> appears 
>>> >                     to be too low resolution (or letter pixel height 
>>> to be 
>>> >                     precise). Approx. 200-300dpi is ideal for 
>>> tesseract's 
>>> >                     default training. Also, JPEG is not a good format 
>>> for text. 
>>> >                     Internally it will convert to TIFF or PNG. 
>>> > 
>>> > 
>>> >                     On Wed, Oct 30, 2013 at 6:50 AM, adrian company < 
>>> >                     [email protected]> wrote: 
>>> > 
>>> >                         Hi all, I am trying to write a software to 
>>> recognize 
>>> >                         some text from an image, but when I binarize 
>>> the image 
>>> >                         and I call to tesseract engine, this does not 
>>> recognize 
>>> >                         text in image. Does somebody know why text it 
>>> is not 
>>> >                         recognized? Must I do something extra to 
>>> recognize? 
>>> >                          I attach the image I am trying to recognize 
>>> text 
>>> >                         (license plate). In this attached image the 
>>> tesseract 
>>> >                         output is nothing. 
>>> > 
>>> >                         I've also tried to recognize text from another 
>>> image 
>>> >                         (Fuma) and in this case the output is: "L I". 
>>> > 
>>> >                         Could anybody help me? 
>>> > 
>>> >                         What could be happening? 
>>> > 
>>> > 
>>> >                         Thanks in advance. 
>>> >                         Adri 
>>> > 
>>> > 
>>> > 
>>> > 
>>> >                         -- 
>>> >                         -- 
>>> >                         You received this message because you are 
>>> subscribed to 
>>> >                         the Google 
>>> >                         Groups "tesseract-ocr" group. 
>>> >                         To post to this group, send email to 
>>> >                         [email protected] 
>>> >                         To unsubscribe from this group, send email to 
>>> >                         [email protected] 
>>> >                         For more options, visit this group at 
>>> >                         http://groups.google.com/
>>> group/tesseract-ocr?hl=en 
>>> >                           
>>> >                         --- 
>>> >                         You received this message because you are 
>>> subscribed to 
>>> >                         the Google Groups "tesseract-ocr" group. 
>>> >                         To unsubscribe from this group and stop 
>>> receiving 
>>> >                         emails from it, send an email to 
>>> >                         [email protected]. 
>>> >                         For more options, visit 
>>> https://groups.google.com/grou 
>>> >                         ps/opt_out. 
>>> > 
>>> > 
>>> > 
>>> > 
>>> >                     -- 
>>> >                     ``All that is gold does not glitter, 
>>> >                       not all those who wander are lost; 
>>> >                     the old that is strong does not wither, 
>>> >                       deep roots are not reached by the frost. 
>>> >                     From the ashes a fire shall be woken, 
>>> >                       a light from the shadows shall spring; 
>>> >                     renewed shall be blade that was broken, 
>>> >                       the crownless again shall be king.” 
>>> > 
>>> >             -- 
>>> >             -- 
>>> >             You received this message because you are subscribed to 
>>> the Google 
>>> >             Groups "tesseract-ocr" group. 
>>> >             To post to this group, send email to 
>>> [email protected] 
>>> >             To unsubscribe from this group, send email to 
>>> >             [email protected] 
>>> >             For more options, visit this group at 
>>> >             http://groups.google.com/group/tesseract-ocr?hl=en 
>>> >               
>>> >             --- 
>>> >             You received this message because you are subscribed to 
>>> the Google 
>>> >             Groups "tesseract-ocr" group. 
>>> >             To unsubscribe from this group and stop receiving emails 
>>> from it, 
>>> >             send an email to [email protected]. 
>>> >             For more options, visit https://groups.google.com/
>>> groups/opt_out. 
>>> > 
>>> > 
>>> > 
>>> > 
>>> >         -- 
>>> >         ``All that is gold does not glitter, 
>>> >           not all those who wander are lost; 
>>> >         the old that is strong does not wither, 
>>> >           deep roots are not reached by the frost. 
>>> >         From the ashes a fire shall be woken, 
>>> >           a light from the shadows shall spring; 
>>> >         renewed shall be blade that was broken, 
>>> >           the crownless again shall be king.” 
>>> > 
>>> > -- 
>>> > -- 
>>> > You received this message because you are subscribed to the Google 
>>> > Groups "tesseract-ocr" group. 
>>> > To post to this group, send email to [email protected] 
>>> > To unsubscribe from this group, send email to 
>>> > [email protected] 
>>> > For more options, visit this group at 
>>> > http://groups.google.com/group/tesseract-ocr?hl=en 
>>> >   
>>> > --- 
>>> > You received this message because you are subscribed to the Google 
>>> Groups 
>>> > "tesseract-ocr" group. 
>>> > To unsubscribe from this group and stop receiving emails from it, send 
>>> an email 
>>> > to [email protected]. 
>>> > For more options, visit https://groups.google.com/groups/opt_out. 
>>>
>>  -- 
>> -- 
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]<javascript:>
>> To unsubscribe from this group, send email to
>> [email protected] <javascript:>
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>  
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: tesseract does not recognize text correctly or does not recognize.

Reply via email to