Hi, I tried it in 3.03 version (on openSUSE 13.1) and there was no segfault (3.02 segfault also for me).
Zdenko On Fri, May 30, 2014 at 6:22 PM, Christopher Smeenk <[email protected]> wrote: > I would like to use tesseract to read data from a scanned high school > transcript. The form contains a bunch of fields (student name, gender, > address) and corresponding values (characters, words or numbers). > > I understand the way to do this is using config files augmented with user > data [see the man page > <http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html>, > patterns are explained in more detail in the file > /path/to/tesseract-ocr/dict/trie.h]. > > However, when I try to set my own eng.user-words or eng.user-patterns, > tesseract returns a *Segmentation Fault*. > > First, here is a test image I am using to check the pattern matching: > (attached file test-002.png) > > Here is some info about my install: > cs@pleco:/data/OCR/tesseract/tests$ lsb_release -a > No LSB modules are available. > Distributor ID: Ubuntu > Description: Ubuntu 12.04.4 LTS > Release: 12.04 > Codename: precise > > > cs@pleco:/data/OCR/tesseract/tests$ tesseract -v > tesseract 3.02.02 > leptonica-1.69 > libjpeg 6b : libpng 1.2.46 : libtiff 3.9.5 : zlib 1.2.3.4 > > > Here's is a good run, showing the output: > cs@pleco:/data/OCR/tesseract/tests$ tesseract testImages/test-002.png > thetext -psm 3 > Tesseract Open Source OCR Engine v3.02.02 with Leptonica > cs@pleco:/data/OCR/tesseract/tests$ cat thetext.txt > Na me: Roosevelt, Fra nklin > > > Age: 102 > > > Name: Harper, Stephen > Age: 58 > > > Name: Hawk, Tony > Age: 34 > > > Nane: Shakespeare, Bill > Age: 432 > > > Here are the config file and user pattern files: > cs@pleco:/usr/share/tesseract-ocr/tessdata$ cat configs/bazaar_test > load_system_dawg F > load_freq_dawg F > user_words_suffix test-words > user_patterns_suffix test-patterns > > > cs@pleco:/usr/share/tesseract-ocr/tessdata$ cat eng.test-patterns > Name: \A\c*, \A\c* > Age: \d* > > > cs@pleco:/usr/share/tesseract-ocr/tessdata$ cat eng.test-words > Name: > Age: > Roosevelt > Franklin > Harper > Stephen > Hawk > Tony > Shakespeare > > > And here is the result when running tesseract with the config file: > cs@pleco:/data/OCR/tesseract/tests$ tesseract testImages/test-002.png > thetext -psm 3 bazaar_test > Tesseract Open Source OCR Engine v3.02.02 with Leptonica > Segmentation fault > > > > What am I doing wrong? Thanks for reading! > > Chris > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/bb5b289c-6453-437e-88e1-3506f8d8bf8f%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/bb5b289c-6453-437e-88e1-3506f8d8bf8f%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8yw3_EpC8D_PRHr_zavn4oF%3Dj4o_ZR3zUG3AdCnL2OMiQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

