Hey everyone. I've spent the last week learning how to use the
tesseract and found it to be very good and useful and following this
guide:

http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3

The only problem is I'm trying to update the traineddata I downloaded
from the download area but I can't update it. The files name is:

eng.traineddata.gz

I've used "combine_tessdata eng." and the new traineddata works as I
have tested it by putting it in the tessdata directory. The only
problem is I can't update the tessdata/eng.traineddata correctly with
my new trainneddata. I tried the following:

1. combine_tessdata eng. //to see if i can generate the traineddata
2. combine_tessdata -u eng.traineddata eng. // I want to unpack the
files so I know what i can use the "overwrite" command to get these
3. combine_tessdata -o eng.traineddata eng.file1 eng.file2 .... // I
take the files that were unpacked. I've tried taking some files or all
files but it won't update the traineddata correctly. I know this is an
overwrite command.

the original image i was working with is read correctly after I
overwrote the traineddata with the new files. But when I read other
images it takes whatever character it has available to fill in the
boxes. For example

"TEST5" was changed to "TESTS" // changed the number '5' to the letter
'S'. the output came out as TESTS just as expected

for another image I used tesseract with the new trainieddata and i
get:

"5 DOLLARS" will be read as S ESESESES // which is understandable
since the new character set has been limited to whatever I just
defined

But I want to continue updating the current training data and not just
overwrite what already works. How would I update the current
traineddata with new traineddata? Which files would I need to
overwrite? Thank you for your responses.

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to