Re: Page layout analysis - don't split columns.

Brock Henry Sat, 26 May 2012 22:34:04 -0700

Joe, I got over my problem, though I don't remember how.

I think I updated to the latest svn version, and no longer had the problem.




On Sunday, 27 May 2012, Joe Aspara <[email protected]> wrote:
> I have the same problem reported by Brock. Anyone has a solution to force
tesseract to read one line at time ignoring the multi-column layout. (I
guess this was the standard behavior in the 1.xx and 2.xx versions)
>
> Il giorno sabato 24 settembre 2011 02:04:23 UTC+2, Brock ha scritto:
>>
>> Hi,
>>
>> I want to OCR a receipt scan, which has a left-aligned column of text,
>> and a right aligned column of prices.
>>
>> Tesseract (most recent from SVN, with commented out dependency to let
>> it compile) is parsing it into columns. I end up with the
>> descriptions, and then below them, the prices. This makes joining the
>> data back together difficult or impossible.
>>
>> I tried all the pagesegmodes (via config file), which made different
>> output, but they were either garbage, or still had the columns parsed
>> separately.
>>
>> Has anyone had and solved this problem? Any tips?
>>
>> Thanks, Brock
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Page layout analysis - don't split columns.

Reply via email to