Thank you Zdenko. I can confirm v3.03 works with no segfault on my system too.
I am still having trouble to use the user-patterns and user-words files to control the output from tesseract v3.03. I will start another thread about this. Chris On Saturday, 31 May 2014 22:19:23 UTC+2, zdenop wrote: > > Hi, > > I tried it in 3.03 version (on openSUSE 13.1) and there was no segfault > (3.02 segfault also for me). > > Zdenko > > > On Fri, May 30, 2014 at 6:22 PM, Christopher Smeenk <[email protected] > <javascript:>> wrote: > >> I would like to use tesseract to read data from a scanned high school >> transcript. The form contains a bunch of fields (student name, gender, >> address) and corresponding values (characters, words or numbers). >> >> I understand the way to do this is using config files augmented with user >> data [see the man page >> <http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html>, >> patterns are explained in more detail in the file >> /path/to/tesseract-ocr/dict/trie.h]. >> >> However, when I try to set my own eng.user-words or eng.user-patterns, >> tesseract returns a *Segmentation Fault*. >> >> First, here is a test image I am using to check the pattern matching: >> (attached file test-002.png) >> >> Here is some info about my install: >> cs@pleco:/data/OCR/tesseract/tests$ lsb_release -a >> No LSB modules are available. >> Distributor ID: Ubuntu >> Description: Ubuntu 12.04.4 LTS >> Release: 12.04 >> Codename: precise >> >> >> cs@pleco:/data/OCR/tesseract/tests$ tesseract -v >> tesseract 3.02.02 >> leptonica-1.69 >> libjpeg 6b : libpng 1.2.46 : libtiff 3.9.5 : zlib 1.2.3.4 >> >> >> Here's is a good run, showing the output: >> cs@pleco:/data/OCR/tesseract/tests$ tesseract testImages/test-002.png >> thetext -psm 3 >> Tesseract Open Source OCR Engine v3.02.02 with Leptonica >> cs@pleco:/data/OCR/tesseract/tests$ cat thetext.txt >> Na me: Roosevelt, Fra nklin >> >> >> Age: 102 >> >> >> Name: Harper, Stephen >> Age: 58 >> >> >> Name: Hawk, Tony >> Age: 34 >> >> >> Nane: Shakespeare, Bill >> Age: 432 >> >> >> Here are the config file and user pattern files: >> cs@pleco:/usr/share/tesseract-ocr/tessdata$ cat configs/bazaar_test >> load_system_dawg F >> load_freq_dawg F >> user_words_suffix test-words >> user_patterns_suffix test-patterns >> >> >> cs@pleco:/usr/share/tesseract-ocr/tessdata$ cat eng.test-patterns >> Name: \A\c*, \A\c* >> Age: \d* >> >> >> cs@pleco:/usr/share/tesseract-ocr/tessdata$ cat eng.test-words >> Name: >> Age: >> Roosevelt >> Franklin >> Harper >> Stephen >> Hawk >> Tony >> Shakespeare >> >> >> And here is the result when running tesseract with the config file: >> cs@pleco:/data/OCR/tesseract/tests$ tesseract testImages/test-002.png >> thetext -psm 3 bazaar_test >> Tesseract Open Source OCR Engine v3.02.02 with Leptonica >> Segmentation fault >> >> >> >> What am I doing wrong? Thanks for reading! >> >> Chris >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at http://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/bb5b289c-6453-437e-88e1-3506f8d8bf8f%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/bb5b289c-6453-437e-88e1-3506f8d8bf8f%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ecfbdee1-1730-42ad-b710-19849c7c19a1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

