[tesseract-ocr] Box file generator combines vertical lines across rows of text

Cameron McSweeney Tue, 24 Apr 2018 08:29:38 -0700

I am working on character recognition at work so I can copy information 
from tables in giant TIFF files and write a program that can automatically 
use the information from those tables. The tables are computer-generated, 
but the information is unavailable to me in any format besides TIFF. The 
font is wonderfully consistent and relatively few characters are used, so 
this should be a fairly easy task.


I have had mild success training Tesseract 3.05, but whenever I make the 
box file for training, Tesseract combines vertical lines across rows into 
one tall, skinny box. The errant box character value is always a tilde (~) 
and the pixels are disqualified from being used in the correct letters. I 
have attached a picture that should better explain my problem.

Is there a way to prevent this? I created a completely new language (not 
.eng) for Tesseract with a box/tiff pair that did not include any of those 
bars, but when I recreate the box file with the new language the tall, 
incorrect boxes are still made. 

Any help would be appreciated.

Thanks,
Cameron

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/79d2181f-6cf9-4cdd-b509-279f0eacca6b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Box file generator combines vertical lines across rows of text

Reply via email to