Re: Segfault using my own traineddata with latest SVN

zdenko podobny Fri, 27 Apr 2012 09:58:01 -0700

On Fri, Apr 27, 2012 at 1:12 PM, Nick White <[email protected]> wrote:


> Hi,
>
> I'm encountering an odd segfault whenever I try to use a new
> traineddata file with the latest svn release (r724). The same
> process produces a perfectly usable traineddata file with 3.01. This
> is the case on the two different Linux boxes I have access to, a
> x86_64 10.04.3 LTS box, and an x86_64 Debian Squeeze box.
>
> I can't get around the issue; I've tried using a variety of
> different box & source files, so I don't think that's the problem.
>
> All of the files I speak of are at
> http://www.dur.ac.uk/nick.white/tmp/tesseractissue/
>
> So, to create the .traineddata file, I run the maketraining.sh
> script. After copying the resulting .traineddata file to
> $PREFIX/share/tessdata/, though, trying to use it results in this:
>
> tesseract testsample.png testout -l grc
> Tesseract Open Source OCR Engine v3.02 with Leptonica
> index >= 0 && index < size_used_:Error:Assert failed:in file
> ../ccutil/genericvector.h, line 512
> Segmentation fault
>
> There's a full backtrace at
> http://www.dur.ac.uk/nick.white/tmp/tesseractissue/gdb.txt
>
> Note that the traineddata files provided in SVN work completely
> fine.
>
> Any thoughts on this? Clues? Suggestions on where to look?
>
> Many thanks
>
> Nick
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

I think problem is that you did not included (renamed) shapetable to
trainneddata.

3.02 training is not documented (yet), so here will list of commands that I
use for 3.02 training:

/opt/bin/tesseract grc.homerextract.exp0.png grc.homerextract.exp0 nobatch
box.train
/opt/bin/unicharset_extractor grc.homerextract.exp0.box
echo "homerextract 1 0 0 1 0" >font_properties
/opt/bin/shapeclustering -F font_properties -U unicharset
grc.homerextract.exp0.tr
/opt/bin/mftraining -F font_properties -U unicharset -O grc.unicharset
grc.homerextract.exp0.tr
/opt/bin/cntraining grc.homerextract.exp0.tr

cp normproto grc.normproto
cp inttemp grc.inttemp
cp pffmtable grc.pffmtable
cp shapetable grc.shapetable

/opt/bin/combine_tessdata grc.

Then you need to install (copy) grc.traineddata to your tessdata directory.
When I run:
/opt/bin/tesseract testsample.png output -l grc
it create output (no Segfault) on openSUSE12.1

-- 
Zdenko

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Segfault using my own traineddata with latest SVN

Reply via email to