You'll see some things on tables in the archives -- it is not very
easy, but here is one link:
https://groups.google.com/forum/?fromgroups#!topic/tesseract-ocr/YyKinyi6Sdw

On Tue, Jun 19, 2012 at 3:26 AM, Neo Song <[email protected]> wrote:
> Dear All,
>
>     Currently I am doing a table text extraction project, and we need to
> identify the table before any OCR process.
>     I investigate the related source code (checked out version:r729), and
> found the there is a table finder class inside tesseract (tablefind.cpp).
> The problem is that for the irregular tables(e.g. different rows have
> different columns), even if I got all the ruling lines, I can not identify
> the concrete table cells.
>     I have called the function "FindLinesCreateBlockList()" and I can
> iterate all the text block, horizontal lines and vertical lines in the
> target image. However I can do nothing with these horizontal lines and
> vertical lines, what I need is something like a CELL_LIST, which contains
> every table cell in a reading order based on table ruling lines. I believe
> that the table finder may already contain such a algorithm(I read the code
> but it is too much complicated), but not exposed to Base API interface. Is
> it true?
>     Can someone help me out of this? How to obtain the table cells? An
> example of such irregular table can be found in the attachment.

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to