See https://stackoverflow.com/questions/34981144/split-text-lines-in-scanned-document
On Sat, Nov 9, 2019 at 3:10 AM Aaron Stewart <[email protected]> wrote: > If you have any suggestions on how to split input images into individual > text lines, I would appreciate it. I am able to use Python and OpenCV, but > I don't have a lot of experience with either. I can read publications if > necessary. > > I'm using Tesseract 5.0.0-alpha from UB Mannheim (Windows 10), to process > pages from a directory. The line spacing is very narrow. In my project, > increasing line spacing improves the recognition accuracy. > > I believe that splitting the input image into separate lines of text would > improve the results, in my case. > > > === Original === > FLOYD. THOMAS J.—La.1,°07; (1°07). > ao LOWNDES = (b’64)-~Ala.2,°90: > > === Spaced === > FLOYD, THOMAS J.—La.1,"07; (1°07). > HENDRICK. LOWNDES (b’64)-—~Ala.2,°90: > (1°90). > > In the original example, the name HENDRICK is missing and the third line > is also missing. > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/d8706d07-4a5e-4a62-899e-b79c31d9ceb6%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/d8706d07-4a5e-4a62-899e-b79c31d9ceb6%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUHSqg_Zps2nJdn0DDH%3DvGUTN-msu8H5eVmvMUyCC-Z5g%40mail.gmail.com.

