Re: tesseract does not recognize text correctly or does not recognize.

adrian company Mon, 09 Dec 2013 04:15:24 -0800

Hi Nick,
I've took a look at api/tesseractmain.cpp as you recommend me, but I cannot 
find anything wrong, I think. Anyway, I could post my program here and try 
to guess what is going on with your help. 
This is my method: 
___________________________________________________________________
void recognizeChar(Mat imagen){


   /*INITIALIZE (TESSERACT)*/
    putenv("TESSDATA_PREFIX=/usr/local/share/");
    setlocale(LC_NUMERIC, "C");
    tesseract::TessBaseAPI OCR;

   if (OCR.Init(NULL, "spa")){
        fprintf( stderr, "cannot could initialize tesseract.... \n" );
        exit(1);
    }
    /*CONFIGURING*/
    OCR.SetPageSegMode(tesseract::PSM_SINGLE_LINE);
    api.SetVariable("tessedit_char_whitelist", 
"ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 ");//lista blanca
    api.SetVariable("tessedit_char_blacklist" , 
"<>abcdefghijklmnopqtrstuvwxyz./!¡$%&?¿,;+-#");//lista negra
    OCR.SetImage(imagen.data, imagen.size().width, imagen.size().height, 
imagen.channels(), imagen.step1());
    OCR.TesseractRect(imagen.data, 0, imagen.step1(), 0, 0, imagen.cols, 
imagen.rows);

    /*GETTING READED TEXT*/
    char* texto = OCR.GetUTF8Text();
    string t1=texto;
   t1.erase( remove(t1.begin(), t1.end(), '\n'), t1.end() );
    cout << "TEXTO: "<<t1.c_str() <<endl;
}
_______________________________________________________________________
Thank you all.

El martes, 3 de diciembre de 2013 11:29:58 UTC+1, Nick White escribió:
>
> Hi Adrian, 
>
> Well then your C++ program must be wrong in some way. The command 
> line version doesn't do anything special, it just uses the API like 
> anything else. Take a look at api/tesseractmain.cpp to check how 
> your API usage differs, to find your bug. 
>
> Nick 
>
> On Tue, Dec 03, 2013 at 01:16:40AM -0800, adrian company wrote: 
> > Hi Sventech, 
> > I've tested the image with the command line version and I get the same 
> result 
> > as you. But when I use my own software in C++ I cannot obtain the same 
> result, 
> > simply get nothing. Currently I am using PSM_SINGLE_LINE, but I've said 
> before 
> > I've tried all the page seg modes. 
> > I don't know what is wrong. I've reinstalled tesseract and do the same. 
> > 
> > 
> > El martes, 3 de diciembre de 2013 07:29:11 UTC+1, adrian company 
> escribió: 
> > 
> >     And about the page seg I've tried with all the page seg but I still 
> get 
> >     anything. 
> > 
> >     El lunes, 2 de diciembre de 2013 16:13:17 UTC+1, sventech escribió: 
> > 
> >         I get 
> >         V! 2\"03ENl 
> >         so you could postprocess that kind of thing to get better 
> results -- 
> >         you need to eliminate the black border for best results. You may 
> need 
> >         to remove noise. What page seg mode are you using? Make sure you 
> test 
> >         with the command line version before you try your own. Also, I'm 
> using 
> >         the latest version 3.02.02 
> >         --Sven 
> > 
> > 
> > 
> >         On Mon, Dec 2, 2013 at 6:18 AM, adrian company <[email protected]> 
>
> >         wrote: 
> > 
> >             Hi again, I've tried to deskew the first image and pass it 
> to 
> >             tesseract greater, but I have the same result, the numbers 
> and 
> >             letters are not recognized by tesseract. I post an image 
> where you 
> >             can see how is my image now. 
> >             Any idea??? 
> >             Thanks in advance again. 
> > 
> > 
> > 
> > 
> > 
> >             El jueves, 31 de octubre de 2013 07:22:53 UTC+1, adrian 
> company 
> >             escribió: 
> > 
> >                 Thanks Sventech, I'll try to deskew the first, i'm using 
> opencv 
> >                 to prepare the image so I cannot use any program to 
> prepare it. 
> >                 I've tried to rotate the image and pass it to tesseract 
> with 
> >                 text in horizontal but tesseract outputs the same. I 
> will also 
> >                 try to pass it to in png format and I will see the 
> result. 
> >                   
> > 
> >                 On Wednesday, October 30, 2013 3:21:58 PM UTC+1, 
> sventech 
> >                 wrote: 
> > 
> >                     In the first image you need to deskew it. There are 
> free 
> >                     programs for preparing the image, The second image 
> appears 
> >                     to be too low resolution (or letter pixel height to 
> be 
> >                     precise). Approx. 200-300dpi is ideal for 
> tesseract's 
> >                     default training. Also, JPEG is not a good format 
> for text. 
> >                     Internally it will convert to TIFF or PNG. 
> > 
> > 
> >                     On Wed, Oct 30, 2013 at 6:50 AM, adrian company < 
> >                     [email protected]> wrote: 
> > 
> >                         Hi all, I am trying to write a software to 
> recognize 
> >                         some text from an image, but when I binarize the 
> image 
> >                         and I call to tesseract engine, this does not 
> recognize 
> >                         text in image. Does somebody know why text it is 
> not 
> >                         recognized? Must I do something extra to 
> recognize? 
> >                          I attach the image I am trying to recognize 
> text 
> >                         (license plate). In this attached image the 
> tesseract 
> >                         output is nothing. 
> > 
> >                         I've also tried to recognize text from another 
> image 
> >                         (Fuma) and in this case the output is: "L I". 
> > 
> >                         Could anybody help me? 
> > 
> >                         What could be happening? 
> > 
> > 
> >                         Thanks in advance. 
> >                         Adri 
> > 
> > 
> > 
> > 
> >                         -- 
> >                         -- 
> >                         You received this message because you are 
> subscribed to 
> >                         the Google 
> >                         Groups "tesseract-ocr" group. 
> >                         To post to this group, send email to 
> >                         [email protected] 
> >                         To unsubscribe from this group, send email to 
> >                         [email protected] 
> >                         For more options, visit this group at 
> >                         
> http://groups.google.com/group/tesseract-ocr?hl=en 
> >                           
> >                         --- 
> >                         You received this message because you are 
> subscribed to 
> >                         the Google Groups "tesseract-ocr" group. 
> >                         To unsubscribe from this group and stop 
> receiving 
> >                         emails from it, send an email to 
> >                         [email protected]. 
> >                         For more options, visit 
> https://groups.google.com/grou 
> >                         ps/opt_out. 
> > 
> > 
> > 
> > 
> >                     -- 
> >                     ``All that is gold does not glitter, 
> >                       not all those who wander are lost; 
> >                     the old that is strong does not wither, 
> >                       deep roots are not reached by the frost. 
> >                     From the ashes a fire shall be woken, 
> >                       a light from the shadows shall spring; 
> >                     renewed shall be blade that was broken, 
> >                       the crownless again shall be king.” 
> > 
> >             -- 
> >             -- 
> >             You received this message because you are subscribed to the 
> Google 
> >             Groups "tesseract-ocr" group. 
> >             To post to this group, send email to 
> [email protected] 
> >             To unsubscribe from this group, send email to 
> >             [email protected] 
> >             For more options, visit this group at 
> >             http://groups.google.com/group/tesseract-ocr?hl=en 
> >               
> >             --- 
> >             You received this message because you are subscribed to the 
> Google 
> >             Groups "tesseract-ocr" group. 
> >             To unsubscribe from this group and stop receiving emails 
> from it, 
> >             send an email to [email protected]. 
> >             For more options, visit 
> https://groups.google.com/groups/opt_out. 
> > 
> > 
> > 
> > 
> >         -- 
> >         ``All that is gold does not glitter, 
> >           not all those who wander are lost; 
> >         the old that is strong does not wither, 
> >           deep roots are not reached by the frost. 
> >         From the ashes a fire shall be woken, 
> >           a light from the shadows shall spring; 
> >         renewed shall be blade that was broken, 
> >           the crownless again shall be king.” 
> > 
> > -- 
> > -- 
> > You received this message because you are subscribed to the Google 
> > Groups "tesseract-ocr" group. 
> > To post to this group, send email to 
> > [email protected]<javascript:> 
> > To unsubscribe from this group, send email to 
> > [email protected] <javascript:> 
> > For more options, visit this group at 
> > http://groups.google.com/group/tesseract-ocr?hl=en 
> >   
> > --- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "tesseract-ocr" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email 
> > to [email protected] <javascript:>. 
> > For more options, visit https://groups.google.com/groups/opt_out. 
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: tesseract does not recognize text correctly or does not recognize.

Reply via email to