Can any one suggest some debug settings I can activate to try to trace down why I'm getting no output? Thanks Danny
On Tuesday, July 30, 2024 at 8:23:38 PM UTC+8 Danny wrote: > I have a problem where tesseract produces no output (zero byte output > file) when presented with Chinese characters followed by either an ellipsis > or three periods. > > [image: bad_sub_243.png] > > If I crop the image in photoshop to remove the dots, the three Chinese > characters are recognized perfectly. Feeding the image above, or feeding > just the three dots, produces no output. > > I've just recompiled with the latest GIT version (see below). I've also > re-trained the chi_tra model several times and added many words with the > three dots to the wordlist. The result is the same with both. > > Any suggestions? > > *Command* > tesseract bad_sub_243.png output -l tqChiTra --loglevel TRACE -c > edges_debug=1 -c ambigs_debug_level=10 -c classify_debug_level=10 -c > dawg_debug_level=3 -c wordrec_debug_blamer=1 -c tessedit_dump_choices=1 > -c tessedit_debug_block_rejection=1 -c textord_noise_debug=1 -c > applybox_debug=10 > > *Messages* > Warning: Parameter not found: language_model_ngram_on > Warning: Parameter not found: segsearch_max_char_wh_ratio > Warning: Parameter not found: language_model_ngram_space_delimited_language > Warning: Parameter not found: language_model_use_sigmoidal_certainty > Warning: Parameter not found: language_model_ngram_nonmatch_score > Warning: Parameter not found: classify_integer_matcher_multiplier > Warning: Parameter not found: assume_fixed_pitch_char_segment > Warning: Parameter not found: allow_blob_division > Warning: Parameter not found: segsearch_max_char_wh_ratio > Warning: Parameter not found: language_model_ngram_space_delimited_language > Warning: Parameter not found: language_model_use_sigmoidal_certainty > Warning: Parameter not found: language_model_ngram_nonmatch_score > Warning: Parameter not found: classify_integer_matcher_multiplier > Warning: Parameter not found: assume_fixed_pitch_char_segment > Warning: Parameter not found: allow_blob_division > Estimating resolution as 675 > Row ending at (221,23.6372): R=9999, dc=3, nc=0, REJECTED > cleanup_blocks: # rows = 0 / 1 > cleanup_blocks: # blocks = 0 / 1 > Estimating resolution as 675 > Row ending at (221,23.6372): R=9999, dc=3, nc=0, REJECTED > cleanup_blocks: # rows = 0 / 1 > cleanup_blocks: # blocks = 0 / 1 > > *Version* > # tesseract --version > tesseract 5.4.1-11-g46b9 > leptonica-1.76.0 > libgif 5.1.4 : libjpeg 6b (libjpeg-turbo 1.5.3) : libpng 1.6.34 : > libtiff 4.0.9 : zlib 1.2.11 : libwebp 1.0.0 > Found AVX > Found SSE4.1 > Found OpenMP 201511 > Found libarchive 3.3.2 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 liblz4/1.8.1 > Found libcurl/7.61.1 OpenSSL/1.1.1c zlib/1.2.11 brotli/1.0.6 > libidn2/2.2.0 libpsl/0.20.2 (+libidn2/2.0.5) libssh/0.9.0/openssl/zlib > nghttp2/1.33.0 > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/11209fd7-65f6-49d1-8153-ae217db71e85n%40googlegroups.com.

