I have a problem where tesseract produces no output (zero byte output file) when presented with Chinese characters followed by either an ellipsis or three periods.
[image: bad_sub_243.png] If I crop the image in photoshop to remove the dots, the three Chinese characters are recognized perfectly. Feeding the image above, or feeding just the three dots, produces no output. I've just recompiled with the latest GIT version (see below). I've also re-trained the chi_tra model several times and added many words with the three dots to the wordlist. The result is the same with both. Any suggestions? *Command* tesseract bad_sub_243.png output -l tqChiTra --loglevel TRACE -c edges_debug=1 -c ambigs_debug_level=10 -c classify_debug_level=10 -c dawg_debug_level=3 -c wordrec_debug_blamer=1 -c tessedit_dump_choices=1 -c tessedit_debug_block_rejection=1 -c textord_noise_debug=1 -c applybox_debug=10 *Messages* Warning: Parameter not found: language_model_ngram_on Warning: Parameter not found: segsearch_max_char_wh_ratio Warning: Parameter not found: language_model_ngram_space_delimited_language Warning: Parameter not found: language_model_use_sigmoidal_certainty Warning: Parameter not found: language_model_ngram_nonmatch_score Warning: Parameter not found: classify_integer_matcher_multiplier Warning: Parameter not found: assume_fixed_pitch_char_segment Warning: Parameter not found: allow_blob_division Warning: Parameter not found: segsearch_max_char_wh_ratio Warning: Parameter not found: language_model_ngram_space_delimited_language Warning: Parameter not found: language_model_use_sigmoidal_certainty Warning: Parameter not found: language_model_ngram_nonmatch_score Warning: Parameter not found: classify_integer_matcher_multiplier Warning: Parameter not found: assume_fixed_pitch_char_segment Warning: Parameter not found: allow_blob_division Estimating resolution as 675 Row ending at (221,23.6372): R=9999, dc=3, nc=0, REJECTED cleanup_blocks: # rows = 0 / 1 cleanup_blocks: # blocks = 0 / 1 Estimating resolution as 675 Row ending at (221,23.6372): R=9999, dc=3, nc=0, REJECTED cleanup_blocks: # rows = 0 / 1 cleanup_blocks: # blocks = 0 / 1 *Version* # tesseract --version tesseract 5.4.1-11-g46b9 leptonica-1.76.0 libgif 5.1.4 : libjpeg 6b (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 1.0.0 Found AVX Found SSE4.1 Found OpenMP 201511 Found libarchive 3.3.2 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 liblz4/1.8.1 Found libcurl/7.61.1 OpenSSL/1.1.1c zlib/1.2.11 brotli/1.0.6 libidn2/2.2.0 libpsl/0.20.2 (+libidn2/2.0.5) libssh/0.9.0/openssl/zlib nghttp2/1.33.0 -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b799ca2a-1983-40be-9e5e-723531bb79e1n%40googlegroups.com.

