Re: Newbie: Training tesseract

Mi Tran Mon, 12 Nov 2012 07:06:56 -0800

>
> Oh, sorry. I use tesseract 3.0.2, win7 32bit.  Processing that I did:
>
> 1.      Generate Training Images: eng.timesitalic.exp0.tif
>
> 2.    Make Box Files: tesseract eng.timesitalic.exp0.tif 
> eng.timesitalic.exp0 batch.nochop makebox
>
> 3.    Bootstrapping a new character set:
>
> tesseract eng.timesitalic.exp0.tif eng.timesitalic.exp0 -l eng 
> batch.nochop makebox
>
> 4.      Run Tesseract for Training: tesseract eng.timesitalic.exp0.tif 
> eng.timesitalic.exp0 nobatch box.train
>
> 5.    Compute the Character Set: unicharset_extractor 
> eng.timesitalic.exp0.box eng.timesitalic.exp1.box
>
> 6.     Create font_properties file, it content: timesitalic 1 0 0 1 0, 
> and then run: 
>
> mftraining -F font_properties -U unicharset -O eng.unicharset 
> eng.timesitalic.exp0.tr
>
> cntraining eng.timesitalic.exp0.tr eng.timesitalic.exp1.tr 
>
> 7.     Dictionary Data: create frequent_words_list file and words_list 
> file, then run:
>
> wordlist2dawg frequent_words_list lang.freq-dawg lang.unicharset
> wordlist2dawg words_list lang.word-dawg lang.unicharset
>
> 8.      Putting it all together: combine_tessdata eng.
>
> 9.      Rename eng.traineddata is nom.traineddata.
>
> 10.  Coppy nom.traineddata into tessdata directory.
>
> 11.  Run instruction: tesseract text.png out -l nom
>
> èError: “"tessdata_manager.SeekToStart<TESSDATE_INTERM>: Error: Assert 
> failed:in file ...\...\classify\adaptmatch.cpp, line 555"”
>
Thanks.


-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Newbie: Training tesseract

Reply via email to