I have a problem where tesseract produces no output (zero byte output file) 
when presented with Chinese characters followed by either an ellipsis or 
three periods.

[image: bad_sub_243.png]

If I crop the image in photoshop to remove the dots, the three Chinese 
characters are recognized perfectly. Feeding the image above, or feeding 
just the three dots, produces no output.

I've just recompiled with the latest GIT version (see below).  I've also 
re-trained the chi_tra model several times and added many words with the 
three dots to the wordlist. The result is the same with both.

Any suggestions?

*Command*
tesseract bad_sub_243.png  output -l tqChiTra --loglevel TRACE   -c 
edges_debug=1   -c ambigs_debug_level=10   -c classify_debug_level=10   -c 
dawg_debug_level=3   -c wordrec_debug_blamer=1   -c tessedit_dump_choices=1 
  -c tessedit_debug_block_rejection=1   -c textord_noise_debug=1   -c 
applybox_debug=10

*Messages*
Warning: Parameter not found: language_model_ngram_on
Warning: Parameter not found: segsearch_max_char_wh_ratio
Warning: Parameter not found: language_model_ngram_space_delimited_language
Warning: Parameter not found: language_model_use_sigmoidal_certainty
Warning: Parameter not found: language_model_ngram_nonmatch_score
Warning: Parameter not found: classify_integer_matcher_multiplier
Warning: Parameter not found: assume_fixed_pitch_char_segment
Warning: Parameter not found: allow_blob_division
Warning: Parameter not found: segsearch_max_char_wh_ratio
Warning: Parameter not found: language_model_ngram_space_delimited_language
Warning: Parameter not found: language_model_use_sigmoidal_certainty
Warning: Parameter not found: language_model_ngram_nonmatch_score
Warning: Parameter not found: classify_integer_matcher_multiplier
Warning: Parameter not found: assume_fixed_pitch_char_segment
Warning: Parameter not found: allow_blob_division
Estimating resolution as 675
Row ending at (221,23.6372): R=9999, dc=3, nc=0, REJECTED
cleanup_blocks: # rows = 0 / 1
cleanup_blocks: # blocks = 0 / 1
Estimating resolution as 675
Row ending at (221,23.6372): R=9999, dc=3, nc=0, REJECTED
cleanup_blocks: # rows = 0 / 1
cleanup_blocks: # blocks = 0 / 1

*Version*
# tesseract --version
tesseract 5.4.1-11-g46b9
 leptonica-1.76.0
  libgif 5.1.4 : libjpeg 6b (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 
4.0.9 : zlib 1.2.11 : libwebp 1.0.0
 Found AVX
 Found SSE4.1
 Found OpenMP 201511
 Found libarchive 3.3.2 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 liblz4/1.8.1
 Found libcurl/7.61.1 OpenSSL/1.1.1c zlib/1.2.11 brotli/1.0.6 libidn2/2.2.0 
libpsl/0.20.2 (+libidn2/2.0.5) libssh/0.9.0/openssl/zlib nghttp2/1.33.0

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b799ca2a-1983-40be-9e5e-723531bb79e1n%40googlegroups.com.

Reply via email to