I put this code to
tesseract-ocr-API-Example-vs2008.zip<http://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-ocr-API-Example-vs2008.zip>
:

    Pix *image;
    char *outText;
    char *configs[]={"myconfig"};
    int configs_size = 1;

    TessBaseAPI *tess = new TessBaseAPI();
    if (tess->Init("C:\\tesseract-3.02\\", "deu", OEM_DEFAULT, configs,
configs_size, NULL, NULL, false)) {
      fprintf(stderr, "Could not initialize tesseract.\n");
      exit(1);
    }

    image = pixRead("C:\\tesseract-3.02\\phototest.tif");
    tess->SetImage(image);
    outText = tess->GetUTF8Text();
    fprintf(stdout, outText);

and it works for me (VC++ 2008 on Windows XP). I have this
in C:\tesseract-3.02:

C:\tesseract-3.02\phototest.tif
C:\tesseract-3.02\tessdata\deu.traineddata
C:\tesseract-3.02\tessdata\deu.user-words
C:\tesseract-3.02\tessdata\configs\myconfig

And deu.user-words effects results of ocr (I have there words like all,
lazy etc.)
Below are some inline comments.
-- 
Zdenko

On Fri, Nov 30, 2012 at 2:04 PM, Matthias Hillert <mhill...@gmail.com>wrote:

> i tried running the program in the console and did get the following error
> message:
>
> Could not open file, C:\tesseract-3.02\tessdata/deu.user-words
>
> The file is definitely there. Maybe it has something to do with the
> different slashes?
>

Windows handle slash correctly (e.g. as directory separator). So problem
should be somewhere else. Are you able to open that path with fopen?


> Is the user-words file supposed to be a dawg file or a simple text file
> with one word per line?
>

One line per word. Simple txt (utf-8 without BOM, unix EOL - but my test
worked with ANSI encoding and Windows EOL at least notepad++ says so ;-) )


> I also tried settings the datapath of the Init function to
> "C:/tesseract-3.02/" to get the right slashes but i got the same result.
>

Check if there is set environment settings (echo %TESSDATA_PREFIX%).


> Regarding you option to set the config file after the init call, i read
> here http://code.google.com/p/tesseract-ocr/wiki/ControlParams
> that you can only set the user_words_suffix param in the init call. Is
> this correct?
>
>
Yes it is correct. But if there is problem I prefer to do things step by
step (e.g. you can try set "init only" parameters after init, but it will
not cause error - just they will effect nothing).


>
> Am Freitag, 30. November 2012 09:56:22 UTC+1 schrieb zdenop:
>>
>> I guess there is problem to find deu.traineddata.
>>
>> I would suggest to run your program in console, so you can see possible
>> error message (something like "Error opening data file C:\Program
>> Files\Tesseract-OCR\tessdata/**deu.traineddata").
>>
>> Another option is to  init tesseract and set variables in more steps to
>> check for errors. Something like this:
>>
>>     const char* configs = "myconfig";
>>
>>     TessBaseAPI *tess = new TessBaseAPI();
>>
>>     if (tess->Init(NULL, "deu", OEM_DEFAULT)) {
>>
>>       fprintf(stderr, "Could not initialize tesseract.\n");
>>
>>       exit(1);
>>
>>     }
>>
>>       // write messages to tesseract.log instead of stderr...
>>
>>     if (!tess->SetVariable("debug_**file", "tesseract.log")) {
>>
>>       fprintf(stderr, "Could not set variable 'debug_file'.\n");
>>
>>     }
>>
>>     tess->ReadConfigFile(configs);
>>
>>
>>
>> --
>> Zdenko
>>
>> On Thu, Nov 29, 2012 at 5:15 PM, Matthias Hillert <mhil...@gmail.com>wrote:
>>
>>>  Hi,
>>>
>>> I am trying to include a custom word directory with a custom
>>> configuration file and the user_words_suffix property.
>>> My code looks like this:
>>>
>>> TessBaseAPI tess;
>>> char *configs[]={"myconfig"};
>>> int configs_size = 1;
>>> tess.Init(NULL, "deu", OEM_DEFAULT, configs, configs_size, NULL, NULL,
>>> false );
>>>
>>> My config file looks like this:
>>>
>>> user_words_suffix user-words
>>>
>>> The Problem is that my program exits with code 1 after the init call.
>>> I tried both a simple deu.user-words file with one word in every line
>>> and also converted the file into a dawg file. Nothing worked.
>>> If I remove the user_words_suffix line in the config file everything
>>> works.
>>>
>>> I am using Tesseract 3.02, Windows 8 and Visual Studio 2012.
>>>
>>> I would really appreciate some help.
>>>
>>>
>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to tesser...@googlegroups.com
>>>
>>> To unsubscribe from this group, send email to
>>> tesseract-oc...@**googlegroups.com
>>>
>>> For more options, visit this group at
>>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>
>>
>>   --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to