Hi there, I want to create tessdata files on a given tiff on my Linux system. My tiff is called k05.tif
I used the description on http://aravindavk.in/view/tesseract_ocr_initial_setup .... which means I do the following step by step: 1. tesseract k05.tif k05 batch.nochop makebox 2. I clean up the box file with jTessBoxEditor.jar (still have problems with special characters like the German ö,ä,ü ...) 3. tesseract k05.tif k05 nobatch box.train 4. unicharset_extractor k05.box 5. cp unicharset k05.unicharset 6. echo k05 0 0 0 0 0 > font_properties 7. mftraining -F font_properties -U unicharset k05.tr 8. mftraining -F font_properties -U unicharset -O k05.unicharset k05.tr 9. cntraining k05.tr 10. mv Microfeat k05.Microfeat 11. mv normproto k05.normproto 12. mv pffmtable k05.pffmtable 13. mv mfunicharset k05.mfunicharset 14. mv inttemp k05.inttemp 15. wordlist2dawg frequent_words_list k05.freq-dawg k05.unicharset Everything works, but combining all the files with combine_tessdata k05 results in Error opening unicharset file The file unicharset exists in my directory (in /home/test/training) I also renamed the file to k05.unicharset. THE FILE IS NOT EMPTY. Somebody knows what I am doing wrong? Thanks for any advice, Holm -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

