Dear all,
I need to integrate Tesseract in a C++ project.
First I simply called Tesseract from command line and, after setting up a
spefic configuration I've come to satifying results.
This is the config file "pharma"
load_system_dawg 0
> load_freq_dawg 0
> load_punc_dawg 0
> user_words_suffix pharma-words
> tessedit_char_whitelist
> abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,
> language_model_penalty_non_dict_word 0
>
Now that I have to do the same thing with a Tesseract API I have terrible
results, like down to 10% of correct identification and 90% garbage.
I must be missing something in the conversion to the API...
This is my code
#include <tesseract/baseapi.h>
> #include <leptonica/allheaders.h>
>
> int main(int argc, char *argv[])
> {
> char *outText;
>
> tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
>
> api -> Init("/usr/local/share/","ita");
> api -> ReadConfigFile ("pharma");
>
>
> Pix *image = pixRead (argv[1]);
> api -> SetImage (image);
> api -> SetSourceResolution(600);
>
> outText = api -> GetUTF8Text();
> printf ("OCR output: \n%s", outText);
>
> api -> End();
> delete [] outText;
> pixDestroy (&image);
>
> return 0;
>
> }
>
Can somebody help me undestand please?
Thanks in advance
Elena
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/7dd534f7-3e85-480f-bb81-3d34c7af0c05%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.