Tesseract 3.01 Training and Error opening unicharset file

Holm Dressler Sat, 21 May 2011 01:54:59 -0700

Hi there,

I want to create tessdata files on a given tiff on my Linux system. My
tiff is called k05.tif


I used the description on

http://aravindavk.in/view/tesseract_ocr_initial_setup

.... which means I do the following step by step:


1. tesseract k05.tif k05 batch.nochop makebox
2. I clean up the box file with jTessBoxEditor.jar (still have
problems with special characters like the German ö,ä,ü ...)

3. tesseract k05.tif k05 nobatch box.train
4. unicharset_extractor k05.box
5. cp unicharset k05.unicharset
6. echo k05 0 0 0 0 0 > font_properties
7. mftraining -F font_properties -U unicharset k05.tr
8. mftraining -F font_properties -U unicharset -O k05.unicharset
k05.tr
9. cntraining k05.tr
10. mv Microfeat k05.Microfeat
11. mv normproto k05.normproto
12. mv pffmtable k05.pffmtable
13. mv mfunicharset k05.mfunicharset
14. mv inttemp k05.inttemp
15. wordlist2dawg frequent_words_list k05.freq-dawg k05.unicharset

Everything works, but combining all the files with

combine_tessdata k05

results in

Error opening unicharset file


The file unicharset exists in my directory (in /home/test/training) I
also renamed the file to k05.unicharset. THE FILE IS NOT EMPTY.

Somebody knows what I am doing wrong?

Thanks for any advice,

Holm



-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Tesseract 3.01 Training and Error opening unicharset file

Reply via email to