I put this code to tesseract-ocr-API-Example-vs2008.zip<http://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-API-Example-vs2008.zip> :
Pix *image; char *outText; char *configs[]={"myconfig"}; int configs_size = 1; TessBaseAPI *tess = new TessBaseAPI(); if (tess->Init("C:\\tesseract-3.02\\", "deu", OEM_DEFAULT, configs, configs_size, NULL, NULL, false)) { fprintf(stderr, "Could not initialize tesseract.\n"); exit(1); } image = pixRead("C:\\tesseract-3.02\\phototest.tif"); tess->SetImage(image); outText = tess->GetUTF8Text(); fprintf(stdout, outText); and it works for me (VC++ 2008 on Windows XP). I have this in C:\tesseract-3.02: C:\tesseract-3.02\phototest.tif C:\tesseract-3.02\tessdata\deu.traineddata C:\tesseract-3.02\tessdata\deu.user-words C:\tesseract-3.02\tessdata\configs\myconfig And deu.user-words effects results of ocr (I have there words like all, lazy etc.) Below are some inline comments. -- Zdenko On Fri, Nov 30, 2012 at 2:04 PM, Matthias Hillert <mhill...@gmail.com>wrote: > i tried running the program in the console and did get the following error > message: > > Could not open file, C:\tesseract-3.02\tessdata/deu.user-words > > The file is definitely there. Maybe it has something to do with the > different slashes? > Windows handle slash correctly (e.g. as directory separator). So problem should be somewhere else. Are you able to open that path with fopen? > Is the user-words file supposed to be a dawg file or a simple text file > with one word per line? > One line per word. Simple txt (utf-8 without BOM, unix EOL - but my test worked with ANSI encoding and Windows EOL at least notepad++ says so ;-) ) > I also tried settings the datapath of the Init function to > "C:/tesseract-3.02/" to get the right slashes but i got the same result. > Check if there is set environment settings (echo %TESSDATA_PREFIX%). > Regarding you option to set the config file after the init call, i read > here http://code.google.com/p/tesseract-ocr/wiki/ControlParams > that you can only set the user_words_suffix param in the init call. Is > this correct? > > Yes it is correct. But if there is problem I prefer to do things step by step (e.g. you can try set "init only" parameters after init, but it will not cause error - just they will effect nothing). > > Am Freitag, 30. November 2012 09:56:22 UTC+1 schrieb zdenop: >> >> I guess there is problem to find deu.traineddata. >> >> I would suggest to run your program in console, so you can see possible >> error message (something like "Error opening data file C:\Program >> Files\Tesseract-OCR\tessdata/**deu.traineddata"). >> >> Another option is to init tesseract and set variables in more steps to >> check for errors. Something like this: >> >> const char* configs = "myconfig"; >> >> TessBaseAPI *tess = new TessBaseAPI(); >> >> if (tess->Init(NULL, "deu", OEM_DEFAULT)) { >> >> fprintf(stderr, "Could not initialize tesseract.\n"); >> >> exit(1); >> >> } >> >> // write messages to tesseract.log instead of stderr... >> >> if (!tess->SetVariable("debug_**file", "tesseract.log")) { >> >> fprintf(stderr, "Could not set variable 'debug_file'.\n"); >> >> } >> >> tess->ReadConfigFile(configs); >> >> >> >> -- >> Zdenko >> >> On Thu, Nov 29, 2012 at 5:15 PM, Matthias Hillert <mhil...@gmail.com>wrote: >> >>> Hi, >>> >>> I am trying to include a custom word directory with a custom >>> configuration file and the user_words_suffix property. >>> My code looks like this: >>> >>> TessBaseAPI tess; >>> char *configs[]={"myconfig"}; >>> int configs_size = 1; >>> tess.Init(NULL, "deu", OEM_DEFAULT, configs, configs_size, NULL, NULL, >>> false ); >>> >>> My config file looks like this: >>> >>> user_words_suffix user-words >>> >>> The Problem is that my program exits with code 1 after the init call. >>> I tried both a simple deu.user-words file with one word in every line >>> and also converted the file into a dawg file. Nothing worked. >>> If I remove the user_words_suffix line in the config file everything >>> works. >>> >>> I am using Tesseract 3.02, Windows 8 and Visual Studio 2012. >>> >>> I would really appreciate some help. >>> >>> >>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To post to this group, send email to tesser...@googlegroups.com >>> >>> To unsubscribe from this group, send email to >>> tesseract-oc...@**googlegroups.com >>> >>> For more options, visit this group at >>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en> >>> >> >> -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to tesseract-ocr@googlegroups.com > To unsubscribe from this group, send email to > tesseract-ocr+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en