Hi Rob, You're getting there, don't worry :)
On Fri, May 16, 2014 at 08:56:50AM -0700, Rob Stewart wrote: > [snip] > unicharset_extractor eng.FreeSans.exp0.box > > set_unicharset_properties -U unicharset -O unicharset.out --script_dir=../ > tesseract-ocr-read-only/training/langdata > > shapeclustering -F font_properties -U unicharset eng.FreeSans.exp0.tr > #shapeclustering -F font_properties -U unicharset.out eng.FreeSans.exp0.tr > > mftraining -F font_properties -U unicharset -O eng.FreeSans.exp0.tr > #mftraining -F font_properties -U unicharset.out -O eng.FreeSans.exp0.tr > > #cntraining eng.FreeSans.exp0.tr > Once I get down to shaperclustering I can't tell from the documentation which > unicharset file to use the first one produced or the one produced by the > 'set_unicharset_properties' command. The one produced by set_unicharset_properties is always better to use, as it should have correct attributes for each character. Note that shapeclustering is generally not recommended for most scripts (I think it's just devanagari scripts that it's used for at the moment). I tested with and without for my grc training, and results were far better without it. > Either way the mftraining usually fails, sometimes a second attempt at running > shapeclustering and mftraining outside of this shell file works, but almost > every time I get the following error... You're calling mftraining slightly incorrectly. The -O argument is for the resulting unicharset, not the .tr file; tesseract is probably getting upset at you overwriting the .tr with a unicharset file while (or maybe even before) reading it. In my grc makefile, I call it like this: mftraining -F font_properties -U grc.earlyunicharset -O grc.unicharset grc*tr (grc.earlyunicharset is the output from set_unicharset_properties). > Any help would be appreciated. Also I think adding this kind of shell script > (or equivalent) to a 'fast start' for training could be useful. You may find the Makefile from my grc repository helpful. Get it with: git clone http://ancientgreekocr.org/grc.git I decided to use a Makefile rather than a shell script so that I can test changes and only the appropriate parts are re-run, rather than everything. Nick -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/20140516162015.GD15463%40manta.lan. For more options, visit https://groups.google.com/d/optout.

