I tried your code and it did not work. I get the error message "Could not 
open file, C:\tesseract-3.02\tessdata/deu.user-words".
I then tried to open the file with fopen. It did not work for the path 

C:\tesseract-3.02\tessdata/deu.user-words

But it worked for the following paths:

C:\\tesseract-3.02\\tessdata\\deu.user-words
C:/tesseract-3.02/tessdata/deu.user-words
C:\\tesseract-3.02\\tessdata/deu.user-words

echo %TESSDATA_PREFIX% yields

C:\tesseract-3.02\

I changed this setting manually to C:/tesseract-3.02/ and now i get the 
error message "Could not open file, 
C:/tesseract-3.02/tessdata/deu.user-words".
I even removed the setting completely so it uses the path supplied with the 
Init call. Still no luck, same error.

Anymore suggestions?



Am Freitag, 30. November 2012 20:08:19 UTC+1 schrieb zdenop:
>
> I put this code to 
> tesseract-ocr-API-Example-vs2008.zip<http://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-API-Example-vs2008.zip>
> :
>
>     Pix *image;
>     char *outText; 
>     char *configs[]={"myconfig"};
>     int configs_size = 1;
>
>     TessBaseAPI *tess = new TessBaseAPI();
>     if (tess->Init("C:\\tesseract-3.02\\", "deu", OEM_DEFAULT, configs, 
> configs_size, NULL, NULL, false)) {
>       fprintf(stderr, "Could not initialize tesseract.\n");
>       exit(1);
>     }
>
>     image = pixRead("C:\\tesseract-3.02\\phototest.tif");
>     tess->SetImage(image); 
>     outText = tess->GetUTF8Text();
>     fprintf(stdout, outText); 
>
> and it works for me (VC++ 2008 on Windows XP). I have this 
> in C:\tesseract-3.02:
>
> C:\tesseract-3.02\phototest.tif
> C:\tesseract-3.02\tessdata\deu.traineddata
> C:\tesseract-3.02\tessdata\deu.user-words
> C:\tesseract-3.02\tessdata\configs\myconfig
>
> And deu.user-words effects results of ocr (I have there words like all, 
> lazy etc.)
> Below are some inline comments.
> -- 
> Zdenko
>
> On Fri, Nov 30, 2012 at 2:04 PM, Matthias Hillert 
> <[email protected]<javascript:>
> > wrote:
>  
>> i tried running the program in the console and did get the following 
>> error message:
>>
>> Could not open file, C:\tesseract-3.02\tessdata/deu.user-words
>>
>> The file is definitely there. Maybe it has something to do with the 
>> different slashes?
>>
>  
> Windows handle slash correctly (e.g. as directory separator). So problem 
> should be somewhere else. Are you able to open that path with fopen?
>  
>
>> Is the user-words file supposed to be a dawg file or a simple text file 
>> with one word per line?
>>
>  
> One line per word. Simple txt (utf-8 without BOM, unix EOL - but my test 
> worked with ANSI encoding and Windows EOL at least notepad++ says so ;-) )
>
>
>> I also tried settings the datapath of the Init function to 
>> "C:/tesseract-3.02/" to get the right slashes but i got the same result.
>>
>
> Check if there is set environment settings (echo %TESSDATA_PREFIX%).
>
>
>> Regarding you option to set the config file after the init call, i read 
>> here http://code.google.com/p/tesseract-ocr/wiki/ControlParams
>> that you can only set the user_words_suffix param in the init call. Is 
>> this correct?
>>
>>
> Yes it is correct. But if there is problem I prefer to do things step by 
> step (e.g. you can try set "init only" parameters after init, but it will 
> not cause error - just they will effect nothing). 
>  
>
>>
>> Am Freitag, 30. November 2012 09:56:22 UTC+1 schrieb zdenop:
>>>
>>> I guess there is problem to find deu.traineddata.
>>>
>>> I would suggest to run your program in console, so you can see possible 
>>> error message (something like "Error opening data file C:\Program 
>>> Files\Tesseract-OCR\tessdata/**deu.traineddata").
>>>
>>> Another option is to  init tesseract and set variables in more steps to 
>>> check for errors. Something like this:
>>>
>>>     const char* configs = "myconfig";
>>>
>>>     TessBaseAPI *tess = new TessBaseAPI();
>>>
>>>     if (tess->Init(NULL, "deu", OEM_DEFAULT)) {
>>>
>>>       fprintf(stderr, "Could not initialize tesseract.\n");
>>>
>>>       exit(1);
>>>
>>>     }
>>>
>>>       // write messages to tesseract.log instead of stderr...
>>>
>>>     if (!tess->SetVariable("debug_**file", "tesseract.log")) {
>>>
>>>       fprintf(stderr, "Could not set variable 'debug_file'.\n");
>>>
>>>     }
>>>
>>>     tess->ReadConfigFile(configs);
>>>
>>>
>>>
>>> -- 
>>> Zdenko
>>>
>>> On Thu, Nov 29, 2012 at 5:15 PM, Matthias Hillert <[email protected]>wrote:
>>>
>>>>  Hi,
>>>>
>>>> I am trying to include a custom word directory with a custom 
>>>> configuration file and the user_words_suffix property.
>>>> My code looks like this:
>>>>
>>>> TessBaseAPI tess;
>>>> char *configs[]={"myconfig"};
>>>> int configs_size = 1;
>>>> tess.Init(NULL, "deu", OEM_DEFAULT, configs, configs_size, NULL, NULL, 
>>>> false );
>>>>
>>>> My config file looks like this:
>>>>
>>>> user_words_suffix user-words
>>>>
>>>> The Problem is that my program exits with code 1 after the init call.
>>>> I tried both a simple deu.user-words file with one word in every line 
>>>> and also converted the file into a dawg file. Nothing worked.
>>>> If I remove the user_words_suffix line in the config file everything 
>>>> works.
>>>>
>>>> I am using Tesseract 3.02, Windows 8 and Visual Studio 2012.
>>>>
>>>> I would really appreciate some help.
>>>>
>>>>
>>>>
>>>>
>>>>  -- 
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To post to this group, send email to [email protected]
>>>>
>>>> To unsubscribe from this group, send email to
>>>> tesseract-oc...@**googlegroups.com
>>>>
>>>> For more options, visit this group at
>>>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>
>>>
>>>   -- 
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]<javascript:>
>> To unsubscribe from this group, send email to
>> [email protected] <javascript:>
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to