Dear Sventech,
I have read this post before. It is a big work for me to do my
customization table analysis algorithm based on Dmitri Silaev's suggestion.
However since I find there is already a well-designed table analysis
algorithm inside Tesseract(3.02 r729), how can I obtain the table cell
information from these internal variables after PAGE SEG analysis. I think
it will yield a much better result than my customization design.
在 2012年6月19日星期二UTC+8下午10时49分56秒,sventech写道:
>
> You'll see some things on tables in the archives -- it is not very
> easy, but here is one link:
>
> https://groups.google.com/forum/?fromgroups#!topic/tesseract-ocr/YyKinyi6Sdw
>
> On Tue, Jun 19, 2012 at 3:26 AM, Neo Song <[email protected]> wrote:
> > Dear All,
> >
> > Currently I am doing a table text extraction project, and we need to
> > identify the table before any OCR process.
> > I investigate the related source code (checked out version:r729),
> and
> > found the there is a table finder class inside tesseract
> (tablefind.cpp).
> > The problem is that for the irregular tables(e.g. different rows have
> > different columns), even if I got all the ruling lines, I can not
> identify
> > the concrete table cells.
> > I have called the function "FindLinesCreateBlockList()" and I can
> > iterate all the text block, horizontal lines and vertical lines in the
> > target image. However I can do nothing with these horizontal lines and
> > vertical lines, what I need is something like a CELL_LIST, which
> contains
> > every table cell in a reading order based on table ruling lines. I
> believe
> > that the table finder may already contain such a algorithm(I read the
> code
> > but it is too much complicated), but not exposed to Base API interface.
> Is
> > it true?
> > Can someone help me out of this? How to obtain the table cells? An
> > example of such irregular table can be found in the attachment.
>
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en