I found that the config file with "user_words_suffix user-words" in it, you 
have to make sure that there is no blank line next to it, otherwise you 
will get the " Can't open xxx " error message, 
In all, you have to notice that the format of the config file, I spend a 
lot of time here to figure this error out....


2012년 12월 7일 금요일 오후 6시 34분 30초 UTC+8, Matthias Hillert 님의 말:
>
> I tried your code and it did not work. I get the error message "Could not 
> open file, C:\tesseract-3.02\tessdata/deu.user-words".
> I then tried to open the file with fopen. It did not work for the path 
>
> C:\tesseract-3.02\tessdata/deu.user-words
>
> But it worked for the following paths:
>
> C:\\tesseract-3.02\\tessdata\\deu.user-words
> C:/tesseract-3.02/tessdata/deu.user-words
> C:\\tesseract-3.02\\tessdata/deu.user-words
>
> echo %TESSDATA_PREFIX% yields
>
> C:\tesseract-3.02\
>
> I changed this setting manually to C:/tesseract-3.02/ and now i get the 
> error message "Could not open file, 
> C:/tesseract-3.02/tessdata/deu.user-words".
> I even removed the setting completely so it uses the path supplied with 
> the Init call. Still no luck, same error.
>
> Anymore suggestions?
>
>
>
> Am Freitag, 30. November 2012 20:08:19 UTC+1 schrieb zdenop:
>>
>> I put this code to 
>> tesseract-ocr-API-Example-vs2008.zip<http://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-API-Example-vs2008.zip>
>> :
>>
>>     Pix *image;
>>     char *outText; 
>>     char *configs[]={"myconfig"};
>>     int configs_size = 1;
>>
>>     TessBaseAPI *tess = new TessBaseAPI();
>>     if (tess->Init("C:\\tesseract-3.02\\", "deu", OEM_DEFAULT, configs, 
>> configs_size, NULL, NULL, false)) {
>>       fprintf(stderr, "Could not initialize tesseract.\n");
>>       exit(1);
>>     }
>>
>>     image = pixRead("C:\\tesseract-3.02\\phototest.tif");
>>     tess->SetImage(image); 
>>     outText = tess->GetUTF8Text();
>>     fprintf(stdout, outText); 
>>
>> and it works for me (VC++ 2008 on Windows XP). I have this 
>> in C:\tesseract-3.02:
>>
>> C:\tesseract-3.02\phototest.tif
>> C:\tesseract-3.02\tessdata\deu.traineddata
>> C:\tesseract-3.02\tessdata\deu.user-words
>> C:\tesseract-3.02\tessdata\configs\myconfig
>>
>> And deu.user-words effects results of ocr (I have there words like all, 
>> lazy etc.)
>> Below are some inline comments.
>> -- 
>> Zdenko
>>
>> On Fri, Nov 30, 2012 at 2:04 PM, Matthias Hillert <[email protected]>wrote:
>>  
>>> i tried running the program in the console and did get the following 
>>> error message:
>>>
>>> Could not open file, C:\tesseract-3.02\tessdata/deu.user-words
>>>
>>> The file is definitely there. Maybe it has something to do with the 
>>> different slashes?
>>>
>>  
>> Windows handle slash correctly (e.g. as directory separator). So problem 
>> should be somewhere else. Are you able to open that path with fopen?
>>  
>>
>>> Is the user-words file supposed to be a dawg file or a simple text file 
>>> with one word per line?
>>>
>>  
>> One line per word. Simple txt (utf-8 without BOM, unix EOL - but my test 
>> worked with ANSI encoding and Windows EOL at least notepad++ says so ;-) )
>>
>>
>>> I also tried settings the datapath of the Init function to 
>>> "C:/tesseract-3.02/" to get the right slashes but i got the same result.
>>>
>>
>> Check if there is set environment settings (echo %TESSDATA_PREFIX%).
>>
>>
>>> Regarding you option to set the config file after the init call, i read 
>>> here http://code.google.com/p/tesseract-ocr/wiki/ControlParams
>>> that you can only set the user_words_suffix param in the init call. Is 
>>> this correct?
>>>
>>>
>> Yes it is correct. But if there is problem I prefer to do things step by 
>> step (e.g. you can try set "init only" parameters after init, but it will 
>> not cause error - just they will effect nothing). 
>>  
>>
>>>
>>> Am Freitag, 30. November 2012 09:56:22 UTC+1 schrieb zdenop:
>>>>
>>>> I guess there is problem to find deu.traineddata.
>>>>
>>>> I would suggest to run your program in console, so you can see possible 
>>>> error message (something like "Error opening data file C:\Program 
>>>> Files\Tesseract-OCR\tessdata/deu.traineddata").
>>>>
>>>> Another option is to  init tesseract and set variables in more steps to 
>>>> check for errors. Something like this:
>>>>
>>>>     const char* configs = "myconfig";
>>>>
>>>>     TessBaseAPI *tess = new TessBaseAPI();
>>>>
>>>>     if (tess->Init(NULL, "deu", OEM_DEFAULT)) {
>>>>
>>>>       fprintf(stderr, "Could not initialize tesseract.\n");
>>>>
>>>>       exit(1);
>>>>
>>>>     }
>>>>
>>>>       // write messages to tesseract.log instead of stderr...
>>>>
>>>>     if (!tess->SetVariable("debug_file", "tesseract.log")) {
>>>>
>>>>       fprintf(stderr, "Could not set variable 'debug_file'.\n");
>>>>
>>>>     }
>>>>
>>>>     tess->ReadConfigFile(configs);
>>>>
>>>>
>>>>
>>>> -- 
>>>> Zdenko
>>>>
>>>> On Thu, Nov 29, 2012 at 5:15 PM, Matthias Hillert <[email protected]>wrote:
>>>>
>>>>>  Hi,
>>>>>
>>>>> I am trying to include a custom word directory with a custom 
>>>>> configuration file and the user_words_suffix property.
>>>>> My code looks like this:
>>>>>
>>>>> TessBaseAPI tess;
>>>>> char *configs[]={"myconfig"};
>>>>> int configs_size = 1;
>>>>> tess.Init(NULL, "deu", OEM_DEFAULT, configs, configs_size, NULL, NULL, 
>>>>> false );
>>>>>
>>>>> My config file looks like this:
>>>>>
>>>>> user_words_suffix user-words
>>>>>
>>>>> The Problem is that my program exits with code 1 after the init call.
>>>>> I tried both a simple deu.user-words file with one word in every line 
>>>>> and also converted the file into a dawg file. Nothing worked.
>>>>> If I remove the user_words_suffix line in the config file everything 
>>>>> works.
>>>>>
>>>>> I am using Tesseract 3.02, Windows 8 and Visual Studio 2012.
>>>>>
>>>>> I would really appreciate some help.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  -- 
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To post to this group, send email to [email protected]
>>>>>
>>>>> To unsubscribe from this group, send email to
>>>>> [email protected]
>>>>>
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>>>
>>>>
>>>>   -- 
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to [email protected]
>>> To unsubscribe from this group, send email to
>>> [email protected]
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b8efd43d-2306-474c-8cd8-b9653b288715%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to