Re: [tesseract-ocr] mftraining core dump - Illegal malloc request size on Ubuntu...

Sriranga(80yrs) Sun, 18 May 2014 03:53:25 -0700

Nick,

When tried to download "makefile" from your grc respository "
http://ancientgreekocr.org/grc.git"; - the said
repository displayed error message as  "*403 - forebidden"*. This is
brought to your kind notice.


With regards,
sriranga(80+)


On Fri, May 16, 2014 at 9:50 PM, Nick White <[email protected]> wrote:

> Hi Rob,
>
> You're getting there, don't worry :)
>
> On Fri, May 16, 2014 at 08:56:50AM -0700, Rob Stewart wrote:
> > [snip]
> > unicharset_extractor eng.FreeSans.exp0.box
> >
> > set_unicharset_properties -U unicharset -O unicharset.out
> --script_dir=../
> > tesseract-ocr-read-only/training/langdata
> >
> > shapeclustering -F font_properties -U unicharset eng.FreeSans.exp0.tr
> > #shapeclustering -F font_properties -U unicharset.out
> eng.FreeSans.exp0.tr
> >
> > mftraining -F font_properties -U unicharset -O eng.FreeSans.exp0.tr
> > #mftraining -F font_properties -U unicharset.out -O eng.FreeSans.exp0.tr
> >
> > #cntraining eng.FreeSans.exp0.tr
>
> > Once I get down to shaperclustering I can't tell from the documentation
> which
> > unicharset file to use the first one produced or the one produced by the
> > 'set_unicharset_properties' command.
>
> The one produced by set_unicharset_properties is always better to
> use, as it should have correct attributes for each character.
>
> Note that shapeclustering is generally not recommended for most
> scripts (I think it's just devanagari scripts that it's used for at
> the moment). I tested with and without for my grc training, and
> results were far better without it.
>
> > Either way the mftraining usually fails, sometimes a second attempt at
> running
> > shapeclustering and mftraining outside of this shell file works, but
> almost
> > every time I get the following error...
>
> You're calling mftraining slightly incorrectly. The -O argument is
> for the resulting unicharset, not the .tr file; tesseract is
> probably getting upset at you overwriting the .tr with a unicharset
> file while (or maybe even before) reading it. In my grc makefile, I
> call it like this:
>
>   mftraining -F font_properties -U grc.earlyunicharset -O grc.unicharset
> grc*tr
>
> (grc.earlyunicharset is the output from set_unicharset_properties).
>
> >   Any help would be appreciated. Also I think adding this kind of shell
> script
> > (or equivalent) to a 'fast start' for training could be useful.
>
> You may find the Makefile from my grc repository helpful. Get it
> with:
>
>   git clone http://ancientgreekocr.org/grc.git
>
> I decided to use a Makefile rather than a shell script so that I can
> test changes and only the appropriate parts are re-run, rather than
> everything.
>
> Nick
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/20140516162015.GD15463%40manta.lan
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CANKD7YxS%3DpyURUkBwfxTUcfLXuHREq4wgFXF%2BVr6XCPxigCTqw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] mftraining core dump - Illegal malloc request size on Ubuntu...

Reply via email to