I tried your code and it did not work. I get the error message "Could not open file, C:\tesseract-3.02\tessdata/deu.user-words". I then tried to open the file with fopen. It did not work for the path
C:\tesseract-3.02\tessdata/deu.user-words But it worked for the following paths: C:\\tesseract-3.02\\tessdata\\deu.user-words C:/tesseract-3.02/tessdata/deu.user-words C:\\tesseract-3.02\\tessdata/deu.user-words echo %TESSDATA_PREFIX% yields C:\tesseract-3.02\ I changed this setting manually to C:/tesseract-3.02/ and now i get the error message "Could not open file, C:/tesseract-3.02/tessdata/deu.user-words". I even removed the setting completely so it uses the path supplied with the Init call. Still no luck, same error. Anymore suggestions? Am Freitag, 30. November 2012 20:08:19 UTC+1 schrieb zdenop: > > I put this code to > tesseract-ocr-API-Example-vs2008.zip<http://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-API-Example-vs2008.zip> > : > > Pix *image; > char *outText; > char *configs[]={"myconfig"}; > int configs_size = 1; > > TessBaseAPI *tess = new TessBaseAPI(); > if (tess->Init("C:\\tesseract-3.02\\", "deu", OEM_DEFAULT, configs, > configs_size, NULL, NULL, false)) { > fprintf(stderr, "Could not initialize tesseract.\n"); > exit(1); > } > > image = pixRead("C:\\tesseract-3.02\\phototest.tif"); > tess->SetImage(image); > outText = tess->GetUTF8Text(); > fprintf(stdout, outText); > > and it works for me (VC++ 2008 on Windows XP). I have this > in C:\tesseract-3.02: > > C:\tesseract-3.02\phototest.tif > C:\tesseract-3.02\tessdata\deu.traineddata > C:\tesseract-3.02\tessdata\deu.user-words > C:\tesseract-3.02\tessdata\configs\myconfig > > And deu.user-words effects results of ocr (I have there words like all, > lazy etc.) > Below are some inline comments. > -- > Zdenko > > On Fri, Nov 30, 2012 at 2:04 PM, Matthias Hillert > <[email protected]<javascript:> > > wrote: > >> i tried running the program in the console and did get the following >> error message: >> >> Could not open file, C:\tesseract-3.02\tessdata/deu.user-words >> >> The file is definitely there. Maybe it has something to do with the >> different slashes? >> > > Windows handle slash correctly (e.g. as directory separator). So problem > should be somewhere else. Are you able to open that path with fopen? > > >> Is the user-words file supposed to be a dawg file or a simple text file >> with one word per line? >> > > One line per word. Simple txt (utf-8 without BOM, unix EOL - but my test > worked with ANSI encoding and Windows EOL at least notepad++ says so ;-) ) > > >> I also tried settings the datapath of the Init function to >> "C:/tesseract-3.02/" to get the right slashes but i got the same result. >> > > Check if there is set environment settings (echo %TESSDATA_PREFIX%). > > >> Regarding you option to set the config file after the init call, i read >> here http://code.google.com/p/tesseract-ocr/wiki/ControlParams >> that you can only set the user_words_suffix param in the init call. Is >> this correct? >> >> > Yes it is correct. But if there is problem I prefer to do things step by > step (e.g. you can try set "init only" parameters after init, but it will > not cause error - just they will effect nothing). > > >> >> Am Freitag, 30. November 2012 09:56:22 UTC+1 schrieb zdenop: >>> >>> I guess there is problem to find deu.traineddata. >>> >>> I would suggest to run your program in console, so you can see possible >>> error message (something like "Error opening data file C:\Program >>> Files\Tesseract-OCR\tessdata/**deu.traineddata"). >>> >>> Another option is to init tesseract and set variables in more steps to >>> check for errors. Something like this: >>> >>> const char* configs = "myconfig"; >>> >>> TessBaseAPI *tess = new TessBaseAPI(); >>> >>> if (tess->Init(NULL, "deu", OEM_DEFAULT)) { >>> >>> fprintf(stderr, "Could not initialize tesseract.\n"); >>> >>> exit(1); >>> >>> } >>> >>> // write messages to tesseract.log instead of stderr... >>> >>> if (!tess->SetVariable("debug_**file", "tesseract.log")) { >>> >>> fprintf(stderr, "Could not set variable 'debug_file'.\n"); >>> >>> } >>> >>> tess->ReadConfigFile(configs); >>> >>> >>> >>> -- >>> Zdenko >>> >>> On Thu, Nov 29, 2012 at 5:15 PM, Matthias Hillert <[email protected]>wrote: >>> >>>> Hi, >>>> >>>> I am trying to include a custom word directory with a custom >>>> configuration file and the user_words_suffix property. >>>> My code looks like this: >>>> >>>> TessBaseAPI tess; >>>> char *configs[]={"myconfig"}; >>>> int configs_size = 1; >>>> tess.Init(NULL, "deu", OEM_DEFAULT, configs, configs_size, NULL, NULL, >>>> false ); >>>> >>>> My config file looks like this: >>>> >>>> user_words_suffix user-words >>>> >>>> The Problem is that my program exits with code 1 after the init call. >>>> I tried both a simple deu.user-words file with one word in every line >>>> and also converted the file into a dawg file. Nothing worked. >>>> If I remove the user_words_suffix line in the config file everything >>>> works. >>>> >>>> I am using Tesseract 3.02, Windows 8 and Visual Studio 2012. >>>> >>>> I would really appreciate some help. >>>> >>>> >>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To post to this group, send email to [email protected] >>>> >>>> To unsubscribe from this group, send email to >>>> tesseract-oc...@**googlegroups.com >>>> >>>> For more options, visit this group at >>>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en> >>>> >>> >>> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected]<javascript:> >> To unsubscribe from this group, send email to >> [email protected] <javascript:> >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

