On Fri, Apr 27, 2012 at 1:12 PM, Nick White <[email protected]> wrote:
> Hi, > > I'm encountering an odd segfault whenever I try to use a new > traineddata file with the latest svn release (r724). The same > process produces a perfectly usable traineddata file with 3.01. This > is the case on the two different Linux boxes I have access to, a > x86_64 10.04.3 LTS box, and an x86_64 Debian Squeeze box. > > I can't get around the issue; I've tried using a variety of > different box & source files, so I don't think that's the problem. > > All of the files I speak of are at > http://www.dur.ac.uk/nick.white/tmp/tesseractissue/ > > So, to create the .traineddata file, I run the maketraining.sh > script. After copying the resulting .traineddata file to > $PREFIX/share/tessdata/, though, trying to use it results in this: > > tesseract testsample.png testout -l grc > Tesseract Open Source OCR Engine v3.02 with Leptonica > index >= 0 && index < size_used_:Error:Assert failed:in file > ../ccutil/genericvector.h, line 512 > Segmentation fault > > There's a full backtrace at > http://www.dur.ac.uk/nick.white/tmp/tesseractissue/gdb.txt > > Note that the traineddata files provided in SVN work completely > fine. > > Any thoughts on this? Clues? Suggestions on where to look? > > Many thanks > > Nick > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > I think problem is that you did not included (renamed) shapetable to trainneddata. 3.02 training is not documented (yet), so here will list of commands that I use for 3.02 training: /opt/bin/tesseract grc.homerextract.exp0.png grc.homerextract.exp0 nobatch box.train /opt/bin/unicharset_extractor grc.homerextract.exp0.box echo "homerextract 1 0 0 1 0" >font_properties /opt/bin/shapeclustering -F font_properties -U unicharset grc.homerextract.exp0.tr /opt/bin/mftraining -F font_properties -U unicharset -O grc.unicharset grc.homerextract.exp0.tr /opt/bin/cntraining grc.homerextract.exp0.tr cp normproto grc.normproto cp inttemp grc.inttemp cp pffmtable grc.pffmtable cp shapetable grc.shapetable /opt/bin/combine_tessdata grc. Then you need to install (copy) grc.traineddata to your tessdata directory. When I run: /opt/bin/tesseract testsample.png output -l grc it create output (no Segfault) on openSUSE12.1 -- Zdenko -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

