Re: Page layout analysis - don't split columns.

Joe Aspara Sat, 26 May 2012 19:42:48 -0700

I have the same problem reported by Brock. Anyone has a solution to force 
tesseract to read one line at time ignoring the multi-column layout. (I 
guess this was the standard behavior in the 1.xx and 2.xx versions)


Il giorno sabato 24 settembre 2011 02:04:23 UTC+2, Brock ha scritto:
>
> Hi, 
>
> I want to OCR a receipt scan, which has a left-aligned column of text, 
> and a right aligned column of prices. 
>
> Tesseract (most recent from SVN, with commented out dependency to let 
> it compile) is parsing it into columns. I end up with the 
> descriptions, and then below them, the prices. This makes joining the 
> data back together difficult or impossible. 
>
> I tried all the pagesegmodes (via config file), which made different 
> output, but they were either garbage, or still had the columns parsed 
> separately. 
>
> Has anyone had and solved this problem? Any tips? 
>
> Thanks, Brock

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Page layout analysis - don't split columns.

Reply via email to