I see the problem (there may be also something else ;-) as I do not have
time to test it yet):
load_system_dawg, load_freq_dawg etc. are init parameters[1] - you try it
set them later they are ignored
You need to pass them to init (see section Tesseract-OCR API[2])

[1] https://code.google.com/p/tesseract-ocr/wiki/ControlParams#Init_only
[2] http://www.sk-spell.sk.cx/tesseract-ocr-parameters-in-302-version

Zdenko


On Fri, Jul 4, 2014 at 9:39 AM, elena bresciani <[email protected]
> wrote:

> Here's an example of the kind of text that I have to read
>
>
> Il giorno venerdì 4 luglio 2014 09:22:09 UTC+2, zdenop ha scritto:
>>
>> Could you please post also testing image?
>>
>> Zdenko
>>
>>
>> On Thu, Jul 3, 2014 at 12:22 PM, elena bresciani <[email protected]>
>> wrote:
>>
>>>  Dear all,
>>>
>>> I need to integrate Tesseract in a C++ project.
>>> First I simply called Tesseract from command line and, after setting up
>>> a spefic configuration I've come to satifying results.
>>>
>>> This is the config file "pharma"
>>>
>>> load_system_dawg 0
>>>> load_freq_dawg 0
>>>> load_punc_dawg    0
>>>> user_words_suffix pharma-words
>>>> tessedit_char_whitelist abcdefghijklmnopqrstuvwxyzABCD
>>>> EFGHIJKLMNOPQRSTUVWXYZ0123456789,
>>>> language_model_penalty_non_dict_word 0
>>>>
>>>
>>>
>>> Now that I have to do the same thing with a Tesseract API I have
>>> terrible results, like down to 10% of correct identification and 90%
>>> garbage.
>>> I must be missing something in the conversion to the API...
>>>
>>> This is my code
>>>
>>> #include <tesseract/baseapi.h>
>>>> #include <leptonica/allheaders.h>
>>>>
>>>> int main(int argc, char *argv[])
>>>> {
>>>>     char *outText;
>>>>
>>>>     tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
>>>>
>>>>     api -> Init("/usr/local/share/","ita");
>>>>     api -> ReadConfigFile ("pharma");
>>>>
>>>>
>>>>     Pix *image = pixRead (argv[1]);
>>>>     api -> SetImage (image);
>>>>     api -> SetSourceResolution(600);
>>>>
>>>>     outText = api -> GetUTF8Text();
>>>>     printf ("OCR output: \n%s", outText);
>>>>
>>>>     api -> End();
>>>>     delete [] outText;
>>>>     pixDestroy (&image);
>>>>
>>>>     return 0;
>>>>
>>>> }
>>>>
>>>
>>>
>>> Can somebody help me undestand please?
>>>
>>> Thanks in advance
>>>
>>> Elena
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>>
>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/tesseract-ocr/7dd534f7-3e85-480f-bb81-3d34c7af0c05%
>>> 40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/7dd534f7-3e85-480f-bb81-3d34c7af0c05%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/27ca8616-1f88-4430-af01-fc4c7c71a3d9%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/27ca8616-1f88-4430-af01-fc4c7c71a3d9%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8zJMk8FAAhLawxAmOFJGRy8UEvLFPTa9RRcArrhd4v%2Bhg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to