I am trying to recognize a flawless image. I created the image from a pdf that is all vector, not image. It has no noise, no skew, flawless characters in any DPI that I want.
The recognition from Tesseract sucks. Generally the problem is dropped characters. It seems to randomly ignore perfectly good looking characters. The screen shot shows the text results in the upper left and the image in the background (only the upper left of the image is visible). The bounding boxes of the results are shown in red on that image. Notice all the missing characters. On this particular image all the characters to the right of what you can see are found and recognized properly. The image consists of a table of information (rows of item #, size, description, and qty). The columns are not nicely aligned (although this example is pretty good). Some rows are separated by a line (this example has a line for each row, and notice that tesseract gives me a bounding box for some of the lines, but not all). I tried removing the lines, but that just changed the set of dropped characters with no rhyme or reason to it. Other images from this same set are very similar but tesseract will drop characters on the right, or whole lines will be missing. I have tried different DPI from 75 to 300, but the results were just as disappointing. Can anyone suggest how this might be solved? <https://lh3.googleusercontent.com/-YwT5YW2wYGo/VuBLmZ-_lSI/AAAAAAAAAZ8/FhfW1gGg_8g/s1600/BadOCR.png> <https://lh3.googleusercontent.com/-ER5AgyxXtY4/VuBLtP6wWvI/AAAAAAAAAaA/1Lxb767Xiqs/s1600/foo700219.png> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/8c27aca6-3a45-4c23-97af-676fc6b0b611%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

