See
https://stackoverflow.com/questions/34981144/split-text-lines-in-scanned-document


On Sat, Nov 9, 2019 at 3:10 AM Aaron Stewart <[email protected]>
wrote:

> If you have any suggestions on how to split input images into individual
> text lines, I would appreciate it.  I am able to use Python and OpenCV, but
> I don't have a lot of experience with either.  I can read publications if
> necessary.
>
> I'm using Tesseract 5.0.0-alpha from UB Mannheim (Windows 10), to process
> pages from a directory.  The line spacing is very narrow.  In my project,
> increasing line spacing improves the recognition accuracy.
>
> I believe that splitting the input image into separate lines of text would
> improve the results, in my case.
>
>
> === Original ===
> FLOYD. THOMAS J.—La.1,°07; (1°07).
> ao LOWNDES = (b’64)-~Ala.2,°90:
>
> === Spaced ===
> FLOYD, THOMAS J.—La.1,"07; (1°07).
> HENDRICK. LOWNDES  (b’64)-—~Ala.2,°90:
> (1°90).
>
> In the original example, the name HENDRICK is missing and the third line
> is also missing.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/d8706d07-4a5e-4a62-899e-b79c31d9ceb6%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/d8706d07-4a5e-4a62-899e-b79c31d9ceb6%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>


-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUHSqg_Zps2nJdn0DDH%3DvGUTN-msu8H5eVmvMUyCC-Z5g%40mail.gmail.com.

Reply via email to