Hi Rob,

You're getting there, don't worry :)

On Fri, May 16, 2014 at 08:56:50AM -0700, Rob Stewart wrote:
> [snip]
> unicharset_extractor eng.FreeSans.exp0.box
> 
> set_unicharset_properties -U unicharset -O unicharset.out --script_dir=../
> tesseract-ocr-read-only/training/langdata
> 
> shapeclustering -F font_properties -U unicharset eng.FreeSans.exp0.tr
> #shapeclustering -F font_properties -U unicharset.out eng.FreeSans.exp0.tr
> 
> mftraining -F font_properties -U unicharset -O eng.FreeSans.exp0.tr
> #mftraining -F font_properties -U unicharset.out -O eng.FreeSans.exp0.tr
> 
> #cntraining eng.FreeSans.exp0.tr

> Once I get down to shaperclustering I can't tell from the documentation which
> unicharset file to use the first one produced or the one produced by the
> 'set_unicharset_properties' command.

The one produced by set_unicharset_properties is always better to 
use, as it should have correct attributes for each character.

Note that shapeclustering is generally not recommended for most 
scripts (I think it's just devanagari scripts that it's used for at 
the moment). I tested with and without for my grc training, and 
results were far better without it.

> Either way the mftraining usually fails, sometimes a second attempt at running
> shapeclustering and mftraining outside of this shell file works, but almost
> every time I get the following error...

You're calling mftraining slightly incorrectly. The -O argument is 
for the resulting unicharset, not the .tr file; tesseract is 
probably getting upset at you overwriting the .tr with a unicharset 
file while (or maybe even before) reading it. In my grc makefile, I 
call it like this:

  mftraining -F font_properties -U grc.earlyunicharset -O grc.unicharset grc*tr

(grc.earlyunicharset is the output from set_unicharset_properties).

>   Any help would be appreciated. Also I think adding this kind of shell script
> (or equivalent) to a 'fast start' for training could be useful.

You may find the Makefile from my grc repository helpful. Get it 
with:

  git clone http://ancientgreekocr.org/grc.git

I decided to use a Makefile rather than a shell script so that I can 
test changes and only the appropriate parts are re-run, rather than 
everything.

Nick

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/20140516162015.GD15463%40manta.lan.
For more options, visit https://groups.google.com/d/optout.

Reply via email to