I am having trouble whitelisting and OCRing apostrophes (English single right quotes). Given something like the attached image, without specifying a whitelist, apostrophes are output:
$ tesseract --user-words ./.user.words /tmp/test-ocr.png stdout Doctor‘s Mask But due to noise (not necessarily on that test image), I have tried implementing a whitelist with letters and numbers, as well as a hyphen, comma, and quotes (you can see my many attempts at apostrophes): $ cat .config tessedit_char_whitelist -",'\'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890\u0027\u2019 The apostrophe doesn't come out: $ tesseract --user-words ./.user.words /tmp/test-ocr.png stdout ./.config Doctors Mask Arch Linux, up to date as of today tesseract 3.05.00 leptonica-1.74 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.1) : libpng 1.6.29 : libtiff 4.0.7 : zlib 1.2.11 : libwebp 0.5.2 Please suggest. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/264cfcae-ef46-4209-a6dd-2653f9547fc6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

