Hello,
Following the tutorial "Training From Scratch", use langdata_lstm and
tesstrain.sh.
I got an error "Segmentation fault" when I executed tesstrain.sh.
Error log:
=== Phase E: Generating lstmf files ===
Loaded 89754/89754 lines (1-89754) of document
/tmp/chi_tra-2021-09-09.CGU/chi_tra.AR_PL_UKai_TW.exp0.lstmf
tesseract/src/training/tesstrain_utils.sh: line 73: 3787663 Segmentation
fault (core dumped) "${cmd}" "$@" 2>&1
3787664 Done | tee -a "${LOG_FILE}"
ERROR: Program tesseract failed. Abort.
There are three questions about this error.
1. Is tessdata_best/lang.traineddata trained by langdata_lstm and
tesstrain.sh?
2. How could I reproduce tessdata_best/lang.traineddata?
3. If training_text is too large, how could I avoid this error?
Thank you in advance!
Environment:
Ubuntu 20.04
tesseract 4.1.1
leptonica-1.79.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : libtiff
4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1
Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.8 liblz4/1.9.2
libzstd/1.4.4
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/9333215e-95bd-43dd-9fe0-dbba021ec16cn%40googlegroups.com.