After doing some more digging and running valgrind on code the last fews
lines were
==360== by 0x95B913A:
tesseract::Tesseract::classify_word_and_language(int, PAGE_RES_IT*,
tesseract::WordData*) (control.cpp:1314)
==360== by 0x95BC63B: tesseract::Tesseract::RecogAllWordsPassN(int,
ETEXT_DESC*, PAGE_RES_IT*, GenericVector<tesseract::WordData>*)
(control.cpp:265)
==360== Address 0xc is not stack'd, malloc'd or (recently) free'd
so it is seem it is trying to read from invalid memory space and going to
control.cpp:1314 results in this line
most_recently_used_->RetryWithLanguage(
*word_data, recognizer, debug, &word_data->lang_words[sub],
&best_words);
so I would guess something is wrong with my language data?
I basically copied the commands from
https://github.com/tesseract-shadow/tesseract-ocr-compilation/blob/master/container-scripts/tessdata_download.sh
to download my data
On Tuesday, April 10, 2018 at 7:29:31 AM UTC-5, Kalven Schraut wrote:
>
> I am attempting to use tesseract's API in my project and everything works
> as expected on ubuntu when running the code, but I am receiving a seg fault
> when I moved everything over to an alpine docker container.
>
> The backtrace from the segfault:
> #0 0x00007ffff2c4a50a in ?? () from /usr/lib/libgomp.so.1
> #1 0x00007ffff2c45d02 in GOMP_parallel () from /usr/lib/libgomp.so.1
> #2 0x00007ffff492cfea in tesseract::FullyConnected::Forward
> (this=0x5555577dbb20, debug=<optimized out>, input=...,
> input_transpose=<optimized out>,
> scratch=0x5555577f9fc8, output=0x55555877b6a0) at
> fullyconnected.cpp:140
> #3 0x00007ffff49598ff in tesseract::Series::Forward (this=0x555557803f60,
> debug=<optimized out>, input=..., input_transpose=<optimized out>,
> scratch=0x5555577f9fc8, output=0x55555877b6a0) at series.cpp:123
> #4 0x00007ffff49598ff in tesseract::Series::Forward (this=0x555557803d60,
> debug=<optimized out>, input=..., input_transpose=<optimized out>,
> scratch=0x5555577f9fc8, output=0x7fffffffc380) at series.cpp:123
> #5 0x00007ffff493b8ce in tesseract::LSTMRecognizer::RecognizeLine
> (this=this@entry=0x5555577f9c80, image_data=..., invert=invert@entry=true,
> debug=debug@entry=false, re_invert=re_invert@entry=false,
> upside_down=upside_down@entry=false, scale_factor=0x7fffffffc35c,
> inputs=0x7fffffffc410,
> outputs=0x7fffffffc380) at lstmrecognizer.cpp:256
> #6 0x00007ffff493c4d0 in tesseract::LSTMRecognizer::RecognizeLine
> (this=0x5555577f9c80, image_data=..., invert=invert@entry=true, debug=false,
> worst_dict_cert=worst_dict_cert@entry=-3.5714285373687744,
> line_box=..., words=words@entry=0x7fffffffc600) at lstmrecognizer.cpp:190
> #7 0x00007ffff480978f in tesseract::Tesseract::LSTMRecognizeWord
> (this=this@entry=0x555557782420, block=..., row=row@entry=0x55555861c2e0,
> word=<optimized out>,
> words=words@entry=0x7fffffffc600) at linerec.cpp:241
> #8 0x00007ffff47ef729 in tesseract::Tesseract::classify_word_pass1
> (this=0x555557782420, word_data=..., in_word=0x5555585bd4e0,
> out_words=0x7fffffffc600)
> at control.cpp:1373
> #9 0x00007ffff47f09a5 in tesseract::Tesseract::RetryWithLanguage
> (this=0x555557782420, word_data=..., recognizer=<optimized out>,
> debug=debug@entry=false,
> in_word=0x5555585bd4e0, best_words=0x7fffffffc6e0) at control.cpp:898
> #10 0x00007ffff47f113b in tesseract::Tesseract::classify_word_and_language
> (this=this@entry=0x555557782420, pass_n=pass_n@entry=1,
> pr_it=pr_it@entry=0x7fffffffc850,
> word_data=word_data@entry=0x55555863ac08) at control.cpp:1314
> #11 0x00007ffff47f463c in tesseract::Tesseract::RecogAllWordsPassN
> (this=this@entry=0x555557782420, pass_n=pass_n@entry=1,
> monitor=monitor@entry=0x0,
> pr_it=pr_it@entry=0x7fffffffc850, words=words@entry=0x7fffffffc830) at
> control.cpp:265
> #12 0x00007ffff47f612d in tesseract::Tesseract::recog_all_words
> (this=0x555557782420, page_res=0x5555585d6160, monitor=monitor@entry=0x0,
> target_word_box=target_word_box@entry=0x0,
> word_config=word_config@entry=0x0, dopasses=dopasses@entry=0) at
> control.cpp:352
>
> I first tried installing the tesseract-git package in alpine where I
> noticed the issue so I just finished compiling the master branch of
> tesseract-ocr and I am still receiving the seg fault.
>
> Also my compiled version of tesseract runs fine through the CLI.
>
> I am lost as to what else could be the problem and would appreciate any
> help/direction on how to solve this issue.
>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/c950daef-6c40-4119-8e88-5ec747f5f6da%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.