Hey everyone. I've spent the last week learning how to use the tesseract and found it to be very good and useful and following this guide:
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 The only problem is I'm trying to update the traineddata I downloaded from the download area but I can't update it. The files name is: eng.traineddata.gz I've used "combine_tessdata eng." and the new traineddata works as I have tested it by putting it in the tessdata directory. The only problem is I can't update the tessdata/eng.traineddata correctly with my new trainneddata. I tried the following: 1. combine_tessdata eng. //to see if i can generate the traineddata 2. combine_tessdata -u eng.traineddata eng. // I want to unpack the files so I know what i can use the "overwrite" command to get these 3. combine_tessdata -o eng.traineddata eng.file1 eng.file2 .... // I take the files that were unpacked. I've tried taking some files or all files but it won't update the traineddata correctly. I know this is an overwrite command. the original image i was working with is read correctly after I overwrote the traineddata with the new files. But when I read other images it takes whatever character it has available to fill in the boxes. For example "TEST5" was changed to "TESTS" // changed the number '5' to the letter 'S'. the output came out as TESTS just as expected for another image I used tesseract with the new trainieddata and i get: "5 DOLLARS" will be read as S ESESESES // which is understandable since the new character set has been limited to whatever I just defined But I want to continue updating the current training data and not just overwrite what already works. How would I update the current traineddata with new traineddata? Which files would I need to overwrite? Thank you for your responses. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

