I have the same problem reported by Brock. Anyone has a solution to force tesseract to read one line at time ignoring the multi-column layout. (I guess this was the standard behavior in the 1.xx and 2.xx versions)
Il giorno sabato 24 settembre 2011 02:04:23 UTC+2, Brock ha scritto: > > Hi, > > I want to OCR a receipt scan, which has a left-aligned column of text, > and a right aligned column of prices. > > Tesseract (most recent from SVN, with commented out dependency to let > it compile) is parsing it into columns. I end up with the > descriptions, and then below them, the prices. This makes joining the > data back together difficult or impossible. > > I tried all the pagesegmodes (via config file), which made different > output, but they were either garbage, or still had the columns parsed > separately. > > Has anyone had and solved this problem? Any tips? > > Thanks, Brock -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

