I found that the config file with "user_words_suffix user-words" in it, you have to make sure that there is no blank line next to it, otherwise you will get the " Can't open xxx " error message, In all, you have to notice that the format of the config file, I spend a lot of time here to figure this error out....
2012년 12월 7일 금요일 오후 6시 34분 30초 UTC+8, Matthias Hillert 님의 말: > > I tried your code and it did not work. I get the error message "Could not > open file, C:\tesseract-3.02\tessdata/deu.user-words". > I then tried to open the file with fopen. It did not work for the path > > C:\tesseract-3.02\tessdata/deu.user-words > > But it worked for the following paths: > > C:\\tesseract-3.02\\tessdata\\deu.user-words > C:/tesseract-3.02/tessdata/deu.user-words > C:\\tesseract-3.02\\tessdata/deu.user-words > > echo %TESSDATA_PREFIX% yields > > C:\tesseract-3.02\ > > I changed this setting manually to C:/tesseract-3.02/ and now i get the > error message "Could not open file, > C:/tesseract-3.02/tessdata/deu.user-words". > I even removed the setting completely so it uses the path supplied with > the Init call. Still no luck, same error. > > Anymore suggestions? > > > > Am Freitag, 30. November 2012 20:08:19 UTC+1 schrieb zdenop: >> >> I put this code to >> tesseract-ocr-API-Example-vs2008.zip<http://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-API-Example-vs2008.zip> >> : >> >> Pix *image; >> char *outText; >> char *configs[]={"myconfig"}; >> int configs_size = 1; >> >> TessBaseAPI *tess = new TessBaseAPI(); >> if (tess->Init("C:\\tesseract-3.02\\", "deu", OEM_DEFAULT, configs, >> configs_size, NULL, NULL, false)) { >> fprintf(stderr, "Could not initialize tesseract.\n"); >> exit(1); >> } >> >> image = pixRead("C:\\tesseract-3.02\\phototest.tif"); >> tess->SetImage(image); >> outText = tess->GetUTF8Text(); >> fprintf(stdout, outText); >> >> and it works for me (VC++ 2008 on Windows XP). I have this >> in C:\tesseract-3.02: >> >> C:\tesseract-3.02\phototest.tif >> C:\tesseract-3.02\tessdata\deu.traineddata >> C:\tesseract-3.02\tessdata\deu.user-words >> C:\tesseract-3.02\tessdata\configs\myconfig >> >> And deu.user-words effects results of ocr (I have there words like all, >> lazy etc.) >> Below are some inline comments. >> -- >> Zdenko >> >> On Fri, Nov 30, 2012 at 2:04 PM, Matthias Hillert <[email protected]>wrote: >> >>> i tried running the program in the console and did get the following >>> error message: >>> >>> Could not open file, C:\tesseract-3.02\tessdata/deu.user-words >>> >>> The file is definitely there. Maybe it has something to do with the >>> different slashes? >>> >> >> Windows handle slash correctly (e.g. as directory separator). So problem >> should be somewhere else. Are you able to open that path with fopen? >> >> >>> Is the user-words file supposed to be a dawg file or a simple text file >>> with one word per line? >>> >> >> One line per word. Simple txt (utf-8 without BOM, unix EOL - but my test >> worked with ANSI encoding and Windows EOL at least notepad++ says so ;-) ) >> >> >>> I also tried settings the datapath of the Init function to >>> "C:/tesseract-3.02/" to get the right slashes but i got the same result. >>> >> >> Check if there is set environment settings (echo %TESSDATA_PREFIX%). >> >> >>> Regarding you option to set the config file after the init call, i read >>> here http://code.google.com/p/tesseract-ocr/wiki/ControlParams >>> that you can only set the user_words_suffix param in the init call. Is >>> this correct? >>> >>> >> Yes it is correct. But if there is problem I prefer to do things step by >> step (e.g. you can try set "init only" parameters after init, but it will >> not cause error - just they will effect nothing). >> >> >>> >>> Am Freitag, 30. November 2012 09:56:22 UTC+1 schrieb zdenop: >>>> >>>> I guess there is problem to find deu.traineddata. >>>> >>>> I would suggest to run your program in console, so you can see possible >>>> error message (something like "Error opening data file C:\Program >>>> Files\Tesseract-OCR\tessdata/deu.traineddata"). >>>> >>>> Another option is to init tesseract and set variables in more steps to >>>> check for errors. Something like this: >>>> >>>> const char* configs = "myconfig"; >>>> >>>> TessBaseAPI *tess = new TessBaseAPI(); >>>> >>>> if (tess->Init(NULL, "deu", OEM_DEFAULT)) { >>>> >>>> fprintf(stderr, "Could not initialize tesseract.\n"); >>>> >>>> exit(1); >>>> >>>> } >>>> >>>> // write messages to tesseract.log instead of stderr... >>>> >>>> if (!tess->SetVariable("debug_file", "tesseract.log")) { >>>> >>>> fprintf(stderr, "Could not set variable 'debug_file'.\n"); >>>> >>>> } >>>> >>>> tess->ReadConfigFile(configs); >>>> >>>> >>>> >>>> -- >>>> Zdenko >>>> >>>> On Thu, Nov 29, 2012 at 5:15 PM, Matthias Hillert <[email protected]>wrote: >>>> >>>>> Hi, >>>>> >>>>> I am trying to include a custom word directory with a custom >>>>> configuration file and the user_words_suffix property. >>>>> My code looks like this: >>>>> >>>>> TessBaseAPI tess; >>>>> char *configs[]={"myconfig"}; >>>>> int configs_size = 1; >>>>> tess.Init(NULL, "deu", OEM_DEFAULT, configs, configs_size, NULL, NULL, >>>>> false ); >>>>> >>>>> My config file looks like this: >>>>> >>>>> user_words_suffix user-words >>>>> >>>>> The Problem is that my program exits with code 1 after the init call. >>>>> I tried both a simple deu.user-words file with one word in every line >>>>> and also converted the file into a dawg file. Nothing worked. >>>>> If I remove the user_words_suffix line in the config file everything >>>>> works. >>>>> >>>>> I am using Tesseract 3.02, Windows 8 and Visual Studio 2012. >>>>> >>>>> I would really appreciate some help. >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To post to this group, send email to [email protected] >>>>> >>>>> To unsubscribe from this group, send email to >>>>> [email protected] >>>>> >>>>> For more options, visit this group at >>>>> http://groups.google.com/group/tesseract-ocr?hl=en >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To post to this group, send email to [email protected] >>> To unsubscribe from this group, send email to >>> [email protected] >>> For more options, visit this group at >>> http://groups.google.com/group/tesseract-ocr?hl=en >>> >> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b8efd43d-2306-474c-8cd8-b9653b288715%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

