I am working on character recognition at work so I can copy information from tables in giant TIFF files and write a program that can automatically use the information from those tables. The tables are computer-generated, but the information is unavailable to me in any format besides TIFF. The font is wonderfully consistent and relatively few characters are used, so this should be a fairly easy task.
I have had mild success training Tesseract 3.05, but whenever I make the box file for training, Tesseract combines vertical lines across rows into one tall, skinny box. The errant box character value is always a tilde (~) and the pixels are disqualified from being used in the correct letters. I have attached a picture that should better explain my problem. Is there a way to prevent this? I created a completely new language (not .eng) for Tesseract with a box/tiff pair that did not include any of those bars, but when I recreate the box file with the new language the tall, incorrect boxes are still made. Any help would be appreciated. Thanks, Cameron -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/79d2181f-6cf9-4cdd-b509-279f0eacca6b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.