I see the problem (there may be also something else ;-) as I do not have time to test it yet): load_system_dawg, load_freq_dawg etc. are init parameters[1] - you try it set them later they are ignored You need to pass them to init (see section Tesseract-OCR API[2])
[1] https://code.google.com/p/tesseract-ocr/wiki/ControlParams#Init_only [2] http://www.sk-spell.sk.cx/tesseract-ocr-parameters-in-302-version Zdenko On Fri, Jul 4, 2014 at 9:39 AM, elena bresciani <[email protected] > wrote: > Here's an example of the kind of text that I have to read > > > Il giorno venerdì 4 luglio 2014 09:22:09 UTC+2, zdenop ha scritto: >> >> Could you please post also testing image? >> >> Zdenko >> >> >> On Thu, Jul 3, 2014 at 12:22 PM, elena bresciani <[email protected]> >> wrote: >> >>> Dear all, >>> >>> I need to integrate Tesseract in a C++ project. >>> First I simply called Tesseract from command line and, after setting up >>> a spefic configuration I've come to satifying results. >>> >>> This is the config file "pharma" >>> >>> load_system_dawg 0 >>>> load_freq_dawg 0 >>>> load_punc_dawg 0 >>>> user_words_suffix pharma-words >>>> tessedit_char_whitelist abcdefghijklmnopqrstuvwxyzABCD >>>> EFGHIJKLMNOPQRSTUVWXYZ0123456789, >>>> language_model_penalty_non_dict_word 0 >>>> >>> >>> >>> Now that I have to do the same thing with a Tesseract API I have >>> terrible results, like down to 10% of correct identification and 90% >>> garbage. >>> I must be missing something in the conversion to the API... >>> >>> This is my code >>> >>> #include <tesseract/baseapi.h> >>>> #include <leptonica/allheaders.h> >>>> >>>> int main(int argc, char *argv[]) >>>> { >>>> char *outText; >>>> >>>> tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI(); >>>> >>>> api -> Init("/usr/local/share/","ita"); >>>> api -> ReadConfigFile ("pharma"); >>>> >>>> >>>> Pix *image = pixRead (argv[1]); >>>> api -> SetImage (image); >>>> api -> SetSourceResolution(600); >>>> >>>> outText = api -> GetUTF8Text(); >>>> printf ("OCR output: \n%s", outText); >>>> >>>> api -> End(); >>>> delete [] outText; >>>> pixDestroy (&image); >>>> >>>> return 0; >>>> >>>> } >>>> >>> >>> >>> Can somebody help me undestand please? >>> >>> Thanks in advance >>> >>> Elena >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> >>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit https://groups.google.com/d/ >>> msgid/tesseract-ocr/7dd534f7-3e85-480f-bb81-3d34c7af0c05% >>> 40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/7dd534f7-3e85-480f-bb81-3d34c7af0c05%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/27ca8616-1f88-4430-af01-fc4c7c71a3d9%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/27ca8616-1f88-4430-af01-fc4c7c71a3d9%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8zJMk8FAAhLawxAmOFJGRy8UEvLFPTa9RRcArrhd4v%2Bhg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

