Nick, When tried to download "makefile" from your grc respository " http://ancientgreekocr.org/grc.git" - the said repository displayed error message as "*403 - forebidden"*. This is brought to your kind notice.
With regards, sriranga(80+) On Fri, May 16, 2014 at 9:50 PM, Nick White <[email protected]> wrote: > Hi Rob, > > You're getting there, don't worry :) > > On Fri, May 16, 2014 at 08:56:50AM -0700, Rob Stewart wrote: > > [snip] > > unicharset_extractor eng.FreeSans.exp0.box > > > > set_unicharset_properties -U unicharset -O unicharset.out > --script_dir=../ > > tesseract-ocr-read-only/training/langdata > > > > shapeclustering -F font_properties -U unicharset eng.FreeSans.exp0.tr > > #shapeclustering -F font_properties -U unicharset.out > eng.FreeSans.exp0.tr > > > > mftraining -F font_properties -U unicharset -O eng.FreeSans.exp0.tr > > #mftraining -F font_properties -U unicharset.out -O eng.FreeSans.exp0.tr > > > > #cntraining eng.FreeSans.exp0.tr > > > Once I get down to shaperclustering I can't tell from the documentation > which > > unicharset file to use the first one produced or the one produced by the > > 'set_unicharset_properties' command. > > The one produced by set_unicharset_properties is always better to > use, as it should have correct attributes for each character. > > Note that shapeclustering is generally not recommended for most > scripts (I think it's just devanagari scripts that it's used for at > the moment). I tested with and without for my grc training, and > results were far better without it. > > > Either way the mftraining usually fails, sometimes a second attempt at > running > > shapeclustering and mftraining outside of this shell file works, but > almost > > every time I get the following error... > > You're calling mftraining slightly incorrectly. The -O argument is > for the resulting unicharset, not the .tr file; tesseract is > probably getting upset at you overwriting the .tr with a unicharset > file while (or maybe even before) reading it. In my grc makefile, I > call it like this: > > mftraining -F font_properties -U grc.earlyunicharset -O grc.unicharset > grc*tr > > (grc.earlyunicharset is the output from set_unicharset_properties). > > > Any help would be appreciated. Also I think adding this kind of shell > script > > (or equivalent) to a 'fast start' for training could be useful. > > You may find the Makefile from my grc repository helpful. Get it > with: > > git clone http://ancientgreekocr.org/grc.git > > I decided to use a Makefile rather than a shell script so that I can > test changes and only the appropriate parts are re-run, rather than > everything. > > Nick > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/20140516162015.GD15463%40manta.lan > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CANKD7YxS%3DpyURUkBwfxTUcfLXuHREq4wgFXF%2BVr6XCPxigCTqw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

