Can you please try to use tesseract-ocr-3.02-win32-portable.zip<http://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-3.02-win32-portable.zip> ?
I tried this on Win7 and it works for me: c:\tesseract-ocr\vs2008> set TESSDATA_PREFIX=c:\tesseract-ocr\ c:\tesseract-ocr\vs2008> tesseract phototest.tif phototest -l deu Tesseract Open Source OCR Engine v3.02 with Leptonica c:\tesseract-ocr\vs2008> tesseract phototest.tif phototest-user -l deu config_file Tesseract Open Source OCR Engine v3.02 with Leptonica c:\tesseract-ocr\vs2008> tesseract phototest.tif phototest-x -l deux Error opening data file c:\tesseract-ocr\tessdata/deux.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory. Failed loading language 'deux' Tesseract couldn't load any languages! Could not initialize tesseract. c:\tesseract-ocr\vs2008> tesseract phototest.tif phototest-user -l spa config_file Error opening data file c:\tesseract-ocr\tessdata/spa.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory. Failed loading language 'spa' Tesseract couldn't load any languages! Could not initialize tesseract. Zdenko On Fri, Dec 7, 2012 at 11:34 AM, Matthias Hillert <[email protected]>wrote: > I tried your code and it did not work. I get the error message "Could not > open file, C:\tesseract-3.02\tessdata/deu.user-words". > I then tried to open the file with fopen. It did not work for the path > > C:\tesseract-3.02\tessdata/deu.user-words > > But it worked for the following paths: > > C:\\tesseract-3.02\\tessdata\\deu.user-words > C:/tesseract-3.02/tessdata/deu.user-words > C:\\tesseract-3.02\\tessdata/deu.user-words > > echo %TESSDATA_PREFIX% yields > > C:\tesseract-3.02\ > > I changed this setting manually to C:/tesseract-3.02/ and now i get the > error message "Could not open file, > C:/tesseract-3.02/tessdata/deu.user-words". > I even removed the setting completely so it uses the path supplied with > the Init call. Still no luck, same error. > > Anymore suggestions? > > > > Am Freitag, 30. November 2012 20:08:19 UTC+1 schrieb zdenop: >> >> I put this code to >> tesseract-ocr-API-Example-**vs2008.zip<http://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-API-Example-vs2008.zip> >> : >> >> Pix *image; >> char *outText; >> char *configs[]={"myconfig"}; >> int configs_size = 1; >> >> TessBaseAPI *tess = new TessBaseAPI(); >> if (tess->Init("C:\\tesseract-3.**02\\", "deu", OEM_DEFAULT, >> configs, configs_size, NULL, NULL, false)) { >> fprintf(stderr, "Could not initialize tesseract.\n"); >> exit(1); >> } >> >> image = pixRead("C:\\tesseract-3.02\\**phototest.tif"); >> tess->SetImage(image); >> outText = tess->GetUTF8Text(); >> fprintf(stdout, outText); >> >> and it works for me (VC++ 2008 on Windows XP). I have this >> in C:\tesseract-3.02: >> >> C:\tesseract-3.02\phototest.**tif >> C:\tesseract-3.02\tessdata\**deu.traineddata >> C:\tesseract-3.02\tessdata\**deu.user-words >> C:\tesseract-3.02\tessdata\**configs\myconfig >> >> And deu.user-words effects results of ocr (I have there words like all, >> lazy etc.) >> Below are some inline comments. >> -- >> Zdenko >> >> On Fri, Nov 30, 2012 at 2:04 PM, Matthias Hillert <[email protected]>wrote: >> >>> i tried running the program in the console and did get the following >>> error message: >>> >>> Could not open file, C:\tesseract-3.02\tessdata/**deu.user-words >>> >>> The file is definitely there. Maybe it has something to do with the >>> different slashes? >>> >> >> Windows handle slash correctly (e.g. as directory separator). So problem >> should be somewhere else. Are you able to open that path with fopen? >> >> >>> Is the user-words file supposed to be a dawg file or a simple text file >>> with one word per line? >>> >> >> One line per word. Simple txt (utf-8 without BOM, unix EOL - but my test >> worked with ANSI encoding and Windows EOL at least notepad++ says so ;-) ) >> >> >>> I also tried settings the datapath of the Init function to >>> "C:/tesseract-3.02/" to get the right slashes but i got the same result. >>> >> >> Check if there is set environment settings (echo %TESSDATA_PREFIX%). >> >> >>> Regarding you option to set the config file after the init call, i read >>> here >>> http://code.google.com/p/**tesseract-ocr/wiki/**ControlParams<http://code.google.com/p/tesseract-ocr/wiki/ControlParams> >>> that you can only set the user_words_suffix param in the init call. Is >>> this correct? >>> >>> >> Yes it is correct. But if there is problem I prefer to do things step by >> step (e.g. you can try set "init only" parameters after init, but it will >> not cause error - just they will effect nothing). >> >> >>> >>> Am Freitag, 30. November 2012 09:56:22 UTC+1 schrieb zdenop: >>>> >>>> I guess there is problem to find deu.traineddata. >>>> >>>> I would suggest to run your program in console, so you can see possible >>>> error message (something like "Error opening data file C:\Program >>>> Files\Tesseract-OCR\tessdata/**d**eu.traineddata"). >>>> >>>> Another option is to init tesseract and set variables in more steps to >>>> check for errors. Something like this: >>>> >>>> const char* configs = "myconfig"; >>>> >>>> TessBaseAPI *tess = new TessBaseAPI(); >>>> >>>> if (tess->Init(NULL, "deu", OEM_DEFAULT)) { >>>> >>>> fprintf(stderr, "Could not initialize tesseract.\n"); >>>> >>>> exit(1); >>>> >>>> } >>>> >>>> // write messages to tesseract.log instead of stderr... >>>> >>>> if (!tess->SetVariable("debug_**fil**e", "tesseract.log")) { >>>> >>>> fprintf(stderr, "Could not set variable 'debug_file'.\n"); >>>> >>>> } >>>> >>>> tess->ReadConfigFile(configs); >>>> >>>> >>>> >>>> -- >>>> Zdenko >>>> >>>> On Thu, Nov 29, 2012 at 5:15 PM, Matthias Hillert <[email protected]>wrote: >>>> >>>>> Hi, >>>>> >>>>> I am trying to include a custom word directory with a custom >>>>> configuration file and the user_words_suffix property. >>>>> My code looks like this: >>>>> >>>>> TessBaseAPI tess; >>>>> char *configs[]={"myconfig"}; >>>>> int configs_size = 1; >>>>> tess.Init(NULL, "deu", OEM_DEFAULT, configs, configs_size, NULL, NULL, >>>>> false ); >>>>> >>>>> My config file looks like this: >>>>> >>>>> user_words_suffix user-words >>>>> >>>>> The Problem is that my program exits with code 1 after the init call. >>>>> I tried both a simple deu.user-words file with one word in every line >>>>> and also converted the file into a dawg file. Nothing worked. >>>>> If I remove the user_words_suffix line in the config file everything >>>>> works. >>>>> >>>>> I am using Tesseract 3.02, Windows 8 and Visual Studio 2012. >>>>> >>>>> I would really appreciate some help. >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To post to this group, send email to [email protected] >>>>> >>>>> To unsubscribe from this group, send email to >>>>> tesseract-oc...@**googlegroups.**com >>>>> >>>>> For more options, visit this group at >>>>> http://groups.google.com/**group**/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en> >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To post to this group, send email to [email protected] >>> To unsubscribe from this group, send email to >>> tesseract-oc...@**googlegroups.com >>> For more options, visit this group at >>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en> >>> >> >> -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

