If you have any suggestions on how to split input images into individual text lines, I would appreciate it. I am able to use Python and OpenCV, but I don't have a lot of experience with either. I can read publications if necessary.
I'm using Tesseract 5.0.0-alpha from UB Mannheim (Windows 10), to process pages from a directory. The line spacing is very narrow. In my project, increasing line spacing improves the recognition accuracy. I believe that splitting the input image into separate lines of text would improve the results, in my case. === Original === FLOYD. THOMAS J.—La.1,°07; (1°07). ao LOWNDES = (b’64)-~Ala.2,°90: === Spaced === FLOYD, THOMAS J.—La.1,"07; (1°07). HENDRICK. LOWNDES (b’64)-—~Ala.2,°90: (1°90). In the original example, the name HENDRICK is missing and the third line is also missing. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d8706d07-4a5e-4a62-899e-b79c31d9ceb6%40googlegroups.com.

