Re: Training Tesseract for single digit

zdenko podobny Sun, 13 Jan 2013 11:02:47 -0800

Hi,

I think you will need to run training for this. I tried simple c++ code
that show confidence values (see attachment) and for your digit 6 it
produced:


        symbol 5, conf: 78.5236 ----            5 conf: 78.523613
                                ----            s conf: 77.376984
                                ----            a conf: 71.858353
                                ----            B conf: 66.046341

It produces recognized symbol ("5") with its confidence value + there are
results from Choice iterator with confidence values. "6" is not there...
If I interpret it correctly you can not expect that current English
language data file will recognize "your" "6" as "6".

Zdenko


On Fri, Jan 11, 2013 at 12:09 AM, sunitha raghurajan <
[email protected]> wrote:

> Yes, this is NH license plate. The first image is with out pre processing
> and the second one is after processing through opencv.
>
>
>
> On Tuesday, January 8, 2013 12:58:19 PM UTC-5, zdenop wrote:
>>
>> On 08.01.2013 17:13, sunitha raghurajan wrote:
>> > I am using Tesseract to read license plate. The tesseract is giving
>> wrong
>> > output for digit six. My question is, Can I train the tesseract for
>> single
>> > digit 'six'. Any help truly appreciated.
>> >
>> Can you post a example of image (with digit 6) that you try to recognize?
>>
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

/*

  $ g++ -o test_char_conf_2 test_char_conf_2.cpp  -ltesseract -llept
  $ ./test_char_conf_2

*/

#include "leptonica/allheaders.h"
#include "tesseract/baseapi.h"

int main() {
    const char *language = "eng";
    const char *datapath = "/usr/src/tesseract-3.02/";
    PIX  *pixs;
    bool indent;

    const char *image = "image3.tif";

    if ((pixs = pixRead(image)) == NULL) {
        printf("Unsupported image type.\n");
        exit(3);
    }

    tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
    api->Init(datapath, language, tesseract::OEM_DEFAULT);
    api->SetVariable("save_blob_choices", "T");
    //api->SetVariable("tessedit_char_whitelist", "0123456789");
    api->SetImage(pixs);
    api->Recognize(NULL);

    tesseract::ResultIterator* ri = api->GetIterator();
    tesseract::PageIteratorLevel level = tesseract::RIL_SYMBOL;

    if (ri != 0) {
        do {
            int left, top, right, bottom;
            ri->BoundingBox(level, &left, &top, &right, &bottom);
            printf("box l:%i b:%i r:%i t:%i\n", left, bottom, right, top);
            const char* symbol = ri->GetUTF8Text(level);
            float conf = ri->Confidence(level);
            if (symbol != 0) {
                printf("\tsymbol %s, conf: %.4f ", symbol, conf);
                indent = false;
                tesseract::ChoiceIterator ci(*ri);
                do {
                    if (indent) printf("\t\t\t\t");
                    printf("----");
                    const char* choice = ci.GetUTF8Text();
                    printf("\t\t%s conf: %f\n", choice, ci.Confidence());
                    indent = true;
                } while (ci.Next());
            }
            delete[] symbol;
        } while ((ri->Next(level)));
    }
    api->End();
    pixDestroy(&pixs);
    return 0;
}

Re: Training Tesseract for single digit

Reply via email to