Hi,

Well, it was answered enough in that I was able to make my own xxx.traineddata 
file.  unfortunately, even with that traineddata file, I'm running into the 
same problem that you are and I can't seem to get tesseract to use the 
freq-dawg that I included.  I've been digging through the source code to find 
the right config but haven't succeeded yet.  I'll let you and the group know 
when I do!

thanks,
max

On May 10, 2011, at 4:32 PM, Parmeet wrote:

> Hello there,
> 
> Sorry if i sounds naive, but i think the original question is not
> answered yet, that is how to include our own word list. After going
> through FAQ page, i found that we can put our eng.user-words file in
> tessdata folder.
> 
> I did exactly same and to test if it works i put characters a though z
> as a single word in eng.user-words file, save it as UTF-8 encoding.
> Then i make an image in Paint and put character from a through z as
> one word (with different fonts in different lines in same image) and
> try to run OCR on it. Unfortunately it did not corrected the output
> even when there is only single wrongly identified character in all the
> characters from a through z. Could you please let me know if i am
> doing something wrong or if somehow i need to retrain using my user-
> words..
> 
> I shall be grateful for early reply.
> 
> Thanks and Kind Regards
> Parmeet
> 
> 
> On May 10, 7:21 am, Max Cantor <[email protected]> wrote:
>> Ok, I found the problem.  the fix is described here:  
>> http://code.google.com/p/tesseract-ocr/issues/detail?id=356
>> 
>> the output dir needs to end in a period.  
>> 
>> my bad.
>> 
>> max
>> 
>> On May 9, 2011, at 3:30 PM, zdenko podobny wrote:
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>> no problem :-) I think you will like option "-o" too.
>> 
>>> Zdenko
>> 
>>> On Mon, May 9, 2011 at 8:27 AM, Max Cantor <[email protected]> wrote:
>>> I feel really dumb now. Sorry for the bother.
>> 
>>> Thanks, max
>> 
>>> On May 9, 2011, at 14:01, zdenko podobny <[email protected]> wrote:
>> 
>>>> Please try to read (to look is not enough ;-) ) [1] :
>> 
>>>> // Specify option -u to unpack all the components to the specified path:
>>>> //
>> 
>>>> // combine_tessdata -u tessdata/eng.traineddata /home/$USER/temp/eng.
>>>> //
>> 
>>>> // This will create  /home/$USER/temp/eng.* files with individual tessdata
>>>> // components from tessdata/eng.traineddata.
>> 
>>>> //
>>>> [1]http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Puttin...
>> 
>>>> On Mon, May 9, 2011 at 2:01 AM, Max Cantor <[email protected]> wrote:
>>>> I was looking at that, but can't find the other component files in the 
>>>> source tree.  is there somewhere to get the component files for the 
>>>> eng.trainneddata?
>> 
>>>> sorry if i'm missing something obvious...
>> 
>>>> max
>>>> On May 9, 2011, at 1:40 AM, zdenko podobny wrote:
>> 
>>>>> see [1] or user-words on the same page.
>> 
>>>>> [1]http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Puttin...
>> 
>>>>> Zdenko
>> 
>>>>> On Sun, May 8, 2011 at 5:53 PM, Max Cantor <[email protected]> wrote:
>>>>> Is there a way to set up a custom wordlist without going through the 
>>>>> entire retraining process?  our wordlists will change a bit at runtime, 
>>>>> so if there is an API variable to set, that would be perfect for us.
>> 
>>>>> Thanks,
>>>>> Max
>> 
>>>>> Keep up the good work!
>> 
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To post to this group, send email to [email protected]
>>>>> To unsubscribe from this group, send email to
>>>>> [email protected]
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/tesseract-ocr?hl=en
>> 
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To post to this group, send email to [email protected]
>>>>> To unsubscribe from this group, send email to
>>>>> [email protected]
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/tesseract-ocr?hl=en
>> 
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To post to this group, send email to [email protected]
>>>> To unsubscribe from this group, send email to
>>>> [email protected]
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/tesseract-ocr?hl=en
>> 
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To post to this group, send email to [email protected]
>>>> To unsubscribe from this group, send email to
>>>> [email protected]
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/tesseract-ocr?hl=en
>> 
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to [email protected]
>>> To unsubscribe from this group, send email to
>>> [email protected]
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en
>> 
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to [email protected]
>>> To unsubscribe from this group, send email to
>>> [email protected]
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to