Dear all,

I need to integrate Tesseract in a C++ project.
First I simply called Tesseract from command line and, after setting up a 
spefic configuration I've come to satifying results.

This is the config file "pharma"

load_system_dawg 0
> load_freq_dawg 0
> load_punc_dawg    0
> user_words_suffix pharma-words
> tessedit_char_whitelist 
> abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,
> language_model_penalty_non_dict_word 0
>


Now that I have to do the same thing with a Tesseract API I have terrible 
results, like down to 10% of correct identification and 90% garbage.
I must be missing something in the conversion to the API...

This is my code

#include <tesseract/baseapi.h>
> #include <leptonica/allheaders.h>
>
> int main(int argc, char *argv[])
> {
>     char *outText;
>     
>     tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
>     
>     api -> Init("/usr/local/share/","ita");
>     api -> ReadConfigFile ("pharma");
>     
>     
>     Pix *image = pixRead (argv[1]);   
>     api -> SetImage (image);
>     api -> SetSourceResolution(600);
>     
>     outText = api -> GetUTF8Text();
>     printf ("OCR output: \n%s", outText);
>     
>     api -> End();
>     delete [] outText;
>     pixDestroy (&image);
>     
>     return 0;
>    
> }
>


Can somebody help me undestand please?
 
Thanks in advance

Elena

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/7dd534f7-3e85-480f-bb81-3d34c7af0c05%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to