Below is a bug report that I'm considering making. However, I'm not entirely positive that its a bug and I'd like someone who knows more about this to check this and make sure that this is a bug so I'm not wasting anyone's time.
The following is the bug report that I'll post if you guys think its right. ### Environment * **Tesseract Version**: tesseract 4.1.0 leptonica-1.78.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.2) : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 1.0.3 Found AVX2 Found AVX Found SSE Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.8 liblz4/1.9.2 libzstd/1.4.3 * **Commit Number**: >From pacman Arch repository (NOT THE AUR) * **Platform**: Linux NickArch 5.4.3-arch1-1 #1 SMP PREEMPT Fri, 13 Dec 2019 09:39:02 +0000 x86_64 GNU/Linux ### Current Behavior: Sample Image link: https://imgur.com/a/TNH3tOx Tesseract will interpret certain characters weirdly (i.e. F as the yen symbol, or E as sometimes '='). The following command correctly whitelists the characters that will appear on the pages, and almost completely eliminates that problem: $ tesseract 205c.tif 205c --psm 6 -c tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789=+& However, since the images are formatted like a table, tesseract will not recognize the smaller spaces in the third column. To fix that issue, I can run the following command. $ tesseract 205c.tif 205c --psm 6 -c tosp_min_sane_kn_sp=0.0 This command completely fixes the spacing problem. However, the previous command obviously does not whitelist the characters so there are many more errors. So I need to run the -c arguments together. I do this by using a config file: config_file: tosp_min_sane_kn_sp 0.0 tessedit_char_whitelist ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789=+& Then I run $ tesseract 205c.tif 205c --psm 6 config_file Tesseract will always ignore one of these options no matter what I do. Maybe I'm doing it wrong, but I've followed what other config files have shown and other command line options. However, I've also tried running the command with more than one -c option. In both cases I cannot get both config variables to work together. ### Expected Behavior: $ Tesseract --help-extra "-c VAR=VALUE Set value for config variables. Multiple -c arguments are allowed." ### Suggested Fix: I'm not even sure if this is a bug, but it definitely seems like it to me. I don't think I have the expertise to look into why this isn't working. Maybe I'm wrong here. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a552cd6a-2c06-4d79-80ec-a973aaecf2fa%40googlegroups.com.

