Dear Neo,
Sorry, I don't really know about table recognition, except that I've
seen several discussions. I don't believe that information you're
wanting is exposed by the API, but you could modify the engine to
provide it. The hOCR html output would give you some offset dimension
information if you don't want to code too much.
--Sven

On Wed, Jun 20, 2012 at 9:13 PM, Neo Song <[email protected]> wrote:
> Dear Sventech,
>
>     I have read this post before. It is a big work for me to do my
> customization table analysis algorithm based on Dmitri Silaev's suggestion.
>     However since I find there is already a well-designed table analysis
> algorithm inside Tesseract(3.02 r729), how can I obtain the table cell
> information from these internal variables after PAGE SEG analysis. I think
> it will yield a much better result than my customization design.
>
> 在 2012年6月19日星期二UTC+8下午10时49分56秒,sventech写道:
>>
>> You'll see some things on tables in the archives -- it is not very
>> easy, but here is one link:
>>
>> https://groups.google.com/forum/?fromgroups#!topic/tesseract-ocr/YyKinyi6Sdw
>>
>> On Tue, Jun 19, 2012 at 3:26 AM, Neo Song <[email protected]> wrote:
>> > Dear All,
>> >
>> >     Currently I am doing a table text extraction project, and we need to
>> > identify the table before any OCR process.
>> >     I investigate the related source code (checked out version:r729),
>> > and
>> > found the there is a table finder class inside tesseract
>> > (tablefind.cpp).
>> > The problem is that for the irregular tables(e.g. different rows have
>> > different columns), even if I got all the ruling lines, I can not
>> > identify
>> > the concrete table cells.
>> >     I have called the function "FindLinesCreateBlockList()" and I can
>> > iterate all the text block, horizontal lines and vertical lines in the
>> > target image. However I can do nothing with these horizontal lines and
>> > vertical lines, what I need is something like a CELL_LIST, which
>> > contains
>> > every table cell in a reading order based on table ruling lines. I
>> > believe
>> > that the table finder may already contain such a algorithm(I read the
>> > code
>> > but it is too much complicated), but not exposed to Base API interface.
>> > Is
>> > it true?
>> >     Can someone help me out of this? How to obtain the table cells? An
>> > example of such irregular table can be found in the attachment.
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en



-- 
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
>From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to