Thank you Zdenko. I can confirm v3.03 works with no segfault on my system 
too.

I am still having trouble to use the user-patterns and user-words files to 
control the output from tesseract v3.03. I will start another thread about 
this.

Chris

On Saturday, 31 May 2014 22:19:23 UTC+2, zdenop wrote:
>
> Hi,
>
> I tried it in 3.03 version (on openSUSE 13.1) and there was no segfault 
> (3.02 segfault also for me).
>
> Zdenko
>
>
> On Fri, May 30, 2014 at 6:22 PM, Christopher Smeenk <[email protected] 
> <javascript:>> wrote:
>
>> I would like to use tesseract to read data from a scanned high school 
>> transcript. The form contains a bunch of fields (student name, gender, 
>> address) and corresponding values (characters, words or numbers).
>>
>> I understand the way to do this is using config files augmented with user 
>> data [see the man page 
>> <http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html>, 
>>  patterns are explained in more detail in the file 
>> /path/to/tesseract-ocr/dict/trie.h].
>>
>> However, when I try to set my own eng.user-words or eng.user-patterns, 
>> tesseract returns a *Segmentation Fault*.
>>
>> First, here is a test image I am using to check the pattern matching: 
>> (attached file test-002.png)
>>
>> Here is some info about my install:
>> cs@pleco:/data/OCR/tesseract/tests$ lsb_release -a
>> No LSB modules are available.
>> Distributor ID: Ubuntu
>> Description: Ubuntu 12.04.4 LTS
>> Release: 12.04
>> Codename: precise
>>
>>
>> cs@pleco:/data/OCR/tesseract/tests$ tesseract -v
>> tesseract 3.02.02
>>  leptonica-1.69
>>   libjpeg 6b : libpng 1.2.46 : libtiff 3.9.5 : zlib 1.2.3.4
>>
>>
>> Here's is a good run, showing the output:
>> cs@pleco:/data/OCR/tesseract/tests$ tesseract testImages/test-002.png 
>> thetext -psm 3
>> Tesseract Open Source OCR Engine v3.02.02 with Leptonica
>> cs@pleco:/data/OCR/tesseract/tests$ cat thetext.txt 
>> Na me: Roosevelt, Fra nklin
>>
>>
>> Age: 102
>>
>>
>> Name: Harper, Stephen
>> Age: 58
>>
>>
>> Name: Hawk, Tony
>> Age: 34
>>
>>
>> Nane: Shakespeare, Bill
>> Age: 432
>>
>>
>> Here are the config file and user pattern files:
>> cs@pleco:/usr/share/tesseract-ocr/tessdata$ cat configs/bazaar_test 
>> load_system_dawg F
>> load_freq_dawg F
>> user_words_suffix test-words
>> user_patterns_suffix test-patterns
>>
>>
>> cs@pleco:/usr/share/tesseract-ocr/tessdata$ cat eng.test-patterns 
>> Name: \A\c*, \A\c*
>> Age: \d*
>>
>>
>> cs@pleco:/usr/share/tesseract-ocr/tessdata$ cat eng.test-words 
>> Name:
>> Age:
>> Roosevelt
>> Franklin
>> Harper
>> Stephen
>> Hawk
>> Tony
>> Shakespeare
>>
>>
>> And here is the result when running tesseract with the config file:
>> cs@pleco:/data/OCR/tesseract/tests$ tesseract testImages/test-002.png 
>> thetext -psm 3 bazaar_test
>> Tesseract Open Source OCR Engine v3.02.02 with Leptonica
>> Segmentation fault
>>
>>
>>
>> What am I doing wrong? Thanks for reading!
>>
>> Chris
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/bb5b289c-6453-437e-88e1-3506f8d8bf8f%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/bb5b289c-6453-437e-88e1-3506f8d8bf8f%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/ecfbdee1-1730-42ad-b710-19849c7c19a1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to