I found the problem:
 
When I created the bazaar file etc., the editor appended the standard MS 
Windows (DOS) line terminations, i.e. <CR><LF>; whereas all other files 
that come with Tessract have Unix style line terminations, i.e. <LF>.  
That's why they are ill-formatted in the standard Notepad.exe - the, much 
better, Notepad2 displays everything correctly and also allows to save the 
files with Unix line terminations.
 
Doing this eliminated the problem.  It appears that the code does not 
handle line terminations in a way that makes it platform independent - 
noting that the output files are also written with Unix line terminations 
in a DOS environment.  So, it may have tried to open C:\Program Files 
(x86)\Tesseract-OCR\tessdata/eng.user-words<CR>, which obviously does not 
exist. * I wonder why this was not a problem for anyone else?*
 
This can typically be overcome by opening text files explicitly as text 
files (which then recognises the different terminations at the different 
platforms) and using things like fgetl, which removes the line 
termination.  Conversely, when such files are written, the \n is handled as 
expected by the platform.
 
 
So, the files are now found; but the whole text in eurotext.tif is still 
returned - it appears that disabling of the default language resources in 
bazaar did not work.  
Could you please run the example in the manual on a MS Windows machine and 
check if it works for you.
 
Thanks,
   Uwe

On Friday, March 22, 2013 9:51:21 AM UTC+10:30, [email protected] wrote:

> This is already set - looks like this was done by the installer.
>  
>  
>    Uwe
>  
>
> On Thursday, March 21, 2013 8:37:47 PM UTC+10:30, zdenop wrote:
>
>> Did you use environment setting TESSDATA_PREFIX ? If no, can you set it 
>> (to "C:\Program Files (x86)\Tesseract-OCR\")?
>>
>> Zdenko
>>
>>
>> On Thu, Mar 21, 2013 at 2:08 AM, <[email protected]> wrote:
>>
>>>
>>> Thanks for the reply.
>>>> Yes, the file does exist, I can open it from my working directory using 
>>>> fopen('C:\Program 
>>>> Files (x86)\Tesseract-OCR\tessdata/**eng.user-words','rt') and read 
>>>> the content using fgetl and the like.
>>>>  
>>>> I also tried the -l eng and -l eng1 test and it behaved as you have 
>>>> described.
>>>>
>>>  -- 
>>> -- 
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to [email protected]
>>> To unsubscribe from this group, send email to
>>> [email protected]
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>  
>>> --- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>  
>>>  
>>>
>>
>>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to