Dear Sventech,

    I have read this post before. It is a big work for me to do my 
customization table analysis algorithm based on Dmitri Silaev's suggestion.
    However since I find there is already a well-designed table analysis 
algorithm inside Tesseract(3.02 r729), how can I obtain the table cell 
information from these internal variables after PAGE SEG analysis. I think 
it will yield a much better result than my customization design.

在 2012年6月19日星期二UTC+8下午10时49分56秒,sventech写道:
>
> You'll see some things on tables in the archives -- it is not very 
> easy, but here is one link: 
>
> https://groups.google.com/forum/?fromgroups#!topic/tesseract-ocr/YyKinyi6Sdw 
>
> On Tue, Jun 19, 2012 at 3:26 AM, Neo Song <[email protected]> wrote: 
> > Dear All, 
> > 
> >     Currently I am doing a table text extraction project, and we need to 
> > identify the table before any OCR process. 
> >     I investigate the related source code (checked out version:r729), 
> and 
> > found the there is a table finder class inside tesseract 
> (tablefind.cpp). 
> > The problem is that for the irregular tables(e.g. different rows have 
> > different columns), even if I got all the ruling lines, I can not 
> identify 
> > the concrete table cells. 
> >     I have called the function "FindLinesCreateBlockList()" and I can 
> > iterate all the text block, horizontal lines and vertical lines in the 
> > target image. However I can do nothing with these horizontal lines and 
> > vertical lines, what I need is something like a CELL_LIST, which 
> contains 
> > every table cell in a reading order based on table ruling lines. I 
> believe 
> > that the table finder may already contain such a algorithm(I read the 
> code 
> > but it is too much complicated), but not exposed to Base API interface. 
> Is 
> > it true? 
> >     Can someone help me out of this? How to obtain the table cells? An 
> > example of such irregular table can be found in the attachment. 
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to