1. Instead of function listing it is better to provide small test case.
   It save time to testers...
   2. Skip not "relevant" code (e.g. if you are testing tesseract api, open
   image with leptonica function and not with opencv...)
   3. You need to fix perspective of image first, so you have some border
   around text. See I did it in gimp, but maybe you can do it in opencv too...



Zdenko


On Mon, Dec 9, 2013 at 1:15 PM, adrian company <[email protected]> wrote:

> Hi Nick,
> I've took a look at api/tesseractmain.cpp as you recommend me, but I
> cannot find anything wrong, I think. Anyway, I could post my program here
> and try to guess what is going on with your help.
> This is my method:
> ___________________________________________________________________
> void recognizeChar(Mat imagen){
>
>    /*INITIALIZE (TESSERACT)*/
>     putenv("TESSDATA_PREFIX=/usr/local/share/");
>     setlocale(LC_NUMERIC, "C");
>     tesseract::TessBaseAPI OCR;
>
>    if (OCR.Init(NULL, "spa")){
>         fprintf( stderr, "cannot could initialize tesseract.... \n" );
>         exit(1);
>     }
>     /*CONFIGURING*/
>     OCR.SetPageSegMode(tesseract::PSM_SINGLE_LINE);
>     api.SetVariable("tessedit_char_whitelist",
> "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 ");//lista blanca
>     api.SetVariable("tessedit_char_blacklist" ,
> "<>abcdefghijklmnopqtrstuvwxyz./!¡$%&?¿,;+-#");//lista negra
>     OCR.SetImage(imagen.data, imagen.size().width, imagen.size().height,
> imagen.channels(), imagen.step1());
>     OCR.TesseractRect(imagen.data, 0, imagen.step1(), 0, 0, imagen.cols,
> imagen.rows);
>
>     /*GETTING READED TEXT*/
>     char* texto = OCR.GetUTF8Text();
>     string t1=texto;
>    t1.erase( remove(t1.begin(), t1.end(), '\n'), t1.end() );
>     cout << "TEXTO: "<<t1.c_str() <<endl;
> }
> _______________________________________________________________________
> Thank you all.
>
> El martes, 3 de diciembre de 2013 11:29:58 UTC+1, Nick White escribió:
>
>> Hi Adrian,
>>
>> Well then your C++ program must be wrong in some way. The command
>> line version doesn't do anything special, it just uses the API like
>> anything else. Take a look at api/tesseractmain.cpp to check how
>> your API usage differs, to find your bug.
>>
>> Nick
>>
>> On Tue, Dec 03, 2013 at 01:16:40AM -0800, adrian company wrote:
>> > Hi Sventech,
>> > I've tested the image with the command line version and I get the same
>> result
>> > as you. But when I use my own software in C++ I cannot obtain the same
>> result,
>> > simply get nothing. Currently I am using PSM_SINGLE_LINE, but I've said
>> before
>> > I've tried all the page seg modes.
>> > I don't know what is wrong. I've reinstalled tesseract and do the same.
>> >
>> >
>> > El martes, 3 de diciembre de 2013 07:29:11 UTC+1, adrian company
>> escribió:
>> >
>> >     And about the page seg I've tried with all the page seg but I still
>> get
>> >     anything.
>> >
>> >     El lunes, 2 de diciembre de 2013 16:13:17 UTC+1, sventech escribió:
>> >
>> >         I get
>> >         V! 2\"03ENl
>> >         so you could postprocess that kind of thing to get better
>> results --
>> >         you need to eliminate the black border for best results. You
>> may need
>> >         to remove noise. What page seg mode are you using? Make sure
>> you test
>> >         with the command line version before you try your own. Also,
>> I'm using
>> >         the latest version 3.02.02
>> >         --Sven
>> >
>> >
>> >
>> >         On Mon, Dec 2, 2013 at 6:18 AM, adrian company <
>> [email protected]>
>> >         wrote:
>> >
>> >             Hi again, I've tried to deskew the first image and pass it
>> to
>> >             tesseract greater, but I have the same result, the numbers
>> and
>> >             letters are not recognized by tesseract. I post an image
>> where you
>> >             can see how is my image now.
>> >             Any idea???
>> >             Thanks in advance again.
>> >
>> >
>> >
>> >
>> >
>> >             El jueves, 31 de octubre de 2013 07:22:53 UTC+1, adrian
>> company
>> >             escribió:
>> >
>> >                 Thanks Sventech, I'll try to deskew the first, i'm
>> using opencv
>> >                 to prepare the image so I cannot use any program to
>> prepare it.
>> >                 I've tried to rotate the image and pass it to tesseract
>> with
>> >                 text in horizontal but tesseract outputs the same. I
>> will also
>> >                 try to pass it to in png format and I will see the
>> result.
>> >
>> >
>> >                 On Wednesday, October 30, 2013 3:21:58 PM UTC+1,
>> sventech
>> >                 wrote:
>> >
>> >                     In the first image you need to deskew it. There are
>> free
>> >                     programs for preparing the image, The second image
>> appears
>> >                     to be too low resolution (or letter pixel height to
>> be
>> >                     precise). Approx. 200-300dpi is ideal for
>> tesseract's
>> >                     default training. Also, JPEG is not a good format
>> for text.
>> >                     Internally it will convert to TIFF or PNG.
>> >
>> >
>> >                     On Wed, Oct 30, 2013 at 6:50 AM, adrian company <
>> >                     [email protected]> wrote:
>> >
>> >                         Hi all, I am trying to write a software to
>> recognize
>> >                         some text from an image, but when I binarize
>> the image
>> >                         and I call to tesseract engine, this does not
>> recognize
>> >                         text in image. Does somebody know why text it
>> is not
>> >                         recognized? Must I do something extra to
>> recognize?
>> >                          I attach the image I am trying to recognize
>> text
>> >                         (license plate). In this attached image the
>> tesseract
>> >                         output is nothing.
>> >
>> >                         I've also tried to recognize text from another
>> image
>> >                         (Fuma) and in this case the output is: "L I".
>> >
>> >                         Could anybody help me?
>> >
>> >                         What could be happening?
>> >
>> >
>> >                         Thanks in advance.
>> >                         Adri
>> >
>> >
>> >
>> >
>> >                         --
>> >                         --
>> >                         You received this message because you are
>> subscribed to
>> >                         the Google
>> >                         Groups "tesseract-ocr" group.
>> >                         To post to this group, send email to
>> >                         [email protected]
>> >                         To unsubscribe from this group, send email to
>> >                         [email protected]
>> >                         For more options, visit this group at
>> >                         http://groups.google.com/
>> group/tesseract-ocr?hl=en
>> >
>> >                         ---
>> >                         You received this message because you are
>> subscribed to
>> >                         the Google Groups "tesseract-ocr" group.
>> >                         To unsubscribe from this group and stop
>> receiving
>> >                         emails from it, send an email to
>> >                         [email protected].
>> >                         For more options, visit
>> https://groups.google.com/grou
>> >                         ps/opt_out.
>> >
>> >
>> >
>> >
>> >                     --
>> >                     ``All that is gold does not glitter,
>> >                       not all those who wander are lost;
>> >                     the old that is strong does not wither,
>> >                       deep roots are not reached by the frost.
>> >                     From the ashes a fire shall be woken,
>> >                       a light from the shadows shall spring;
>> >                     renewed shall be blade that was broken,
>> >                       the crownless again shall be king.”
>> >
>> >             --
>> >             --
>> >             You received this message because you are subscribed to the
>> Google
>> >             Groups "tesseract-ocr" group.
>> >             To post to this group, send email to
>> [email protected]
>> >             To unsubscribe from this group, send email to
>> >             [email protected]
>> >             For more options, visit this group at
>> >             http://groups.google.com/group/tesseract-ocr?hl=en
>> >
>> >             ---
>> >             You received this message because you are subscribed to the
>> Google
>> >             Groups "tesseract-ocr" group.
>> >             To unsubscribe from this group and stop receiving emails
>> from it,
>> >             send an email to [email protected].
>> >             For more options, visit https://groups.google.com/
>> groups/opt_out.
>> >
>> >
>> >
>> >
>> >         --
>> >         ``All that is gold does not glitter,
>> >           not all those who wander are lost;
>> >         the old that is strong does not wither,
>> >           deep roots are not reached by the frost.
>> >         From the ashes a fire shall be woken,
>> >           a light from the shadows shall spring;
>> >         renewed shall be blade that was broken,
>> >           the crownless again shall be king.”
>> >
>> > --
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups "tesseract-ocr" group.
>> > To post to this group, send email to [email protected]
>> > To unsubscribe from this group, send email to
>> > [email protected]
>> > For more options, visit this group at
>> > http://groups.google.com/group/tesseract-ocr?hl=en
>> >
>> > ---
>> > You received this message because you are subscribed to the Google
>> Groups
>> > "tesseract-ocr" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an email
>> > to [email protected].
>> > For more options, visit https://groups.google.com/groups/opt_out.
>>
>  --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.
/*
 $ g++ -o api_test api_test.cpp -ltesseract -llept
  */

#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>
#include <iostream>
#include <algorithm>

using namespace std;

int main()
{
    // putenv("TESSDATA_PREFIX=/usr/local/share/");
    setlocale(LC_NUMERIC, "C");
    tesseract::TessBaseAPI OCR;

   if (OCR.Init(NULL, "eng")){
        fprintf( stderr, "cannot could initialize tesseract.... \n" );
        exit(1);
    }
    /*CONFIGURING*/
    OCR.SetPageSegMode(tesseract::PSM_SINGLE_LINE);
    OCR.SetVariable("tessedit_char_whitelist", "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 ");  // lista blanca
    OCR.SetVariable("tessedit_char_blacklist" , "<>abcdefghijklmnopqtrstuvwxyz./!¡$%&?¿,;+-#"); // lista negra
    Pix *image = pixRead("binaria2.png");
    OCR.SetImage(image);
    OCR.SetRectangle(0, 0, 140, 33);
    //OCR.SetImage(imagen.data, imagen.size().width, imagen.size().height, imagen.channels(), imagen.step1());
    //OCR.TesseractRect(imagen.data, 0, imagen.step1(), 0, 0, imagen.cols, imagen.rows);

    /*GETTING READED TEXT*/
    char* texto = OCR.GetUTF8Text();
    string t1=texto;
    t1.erase( remove(t1.begin(), t1.end(), '\n'), t1.end() );
    cout << "TEXTO: "<<t1.c_str() <<endl;
}

<<attachment: binaria2.png>>

Reply via email to