[tesseract-ocr] Re: More about CLASS "TableFinder" in tablefind.h

ANBU J Tue, 08 Apr 2014 21:44:21 -0700

Thanks for the suggestion. How it'll help me to recreate the table? We 
can't assume blank spaces as a mean to separate cells. I think, we have to 
add tags between cells while removing the lines. It would be better if we 
can add <tr>, <td> tags, but any special character to identify the cells 
would do. A lines removed image won't help us to identify or recreate the 
data as a table. 
Sorry if I'm being stupid, but I'm really new to this.


On Wednesday, April 9, 2014 9:57:21 AM UTC+5:30, temp name wrote:
>
> Try using "tessedit_dump_page_segment  T" , config parameter, it dumps the 
> image after removing horizontal and vertical lines. You can use this image 
> further for OCR.
>
> On Wednesday, April 9, 2014 9:47:41 AM UTC+5:30, ANBU J wrote:
>>
>> Thanks for the reply Nick. I'm doing it. It is very hard ti figure out 
>>> the functionality of methods without understanding the whole project. Since 
>>> I have to find out what are those header files do and the relation, it is 
>>> going to take a lot of time. I'd appreciate if anyone can point me out 
>>> where the outputs (the extracted text from table) being passed. So that I 
>>> can add html table tags to the output to reproduce the table in html 
>>> format.
>>
>> Anbu   
>>
>> On Tuesday, April 8, 2014 9:08:30 PM UTC+5:30, Nick White wrote:
>>>
>>> Documentation for the internals of Tesseract is unfortunately rather 
>>> minimal, indeed. I'd recommend you take a look at the TableFinder 
>>> class in the code to figure it out. And please do share anything you 
>>> learn here! 
>>>
>>> Nick 
>>>
>>> On Mon, Apr 07, 2014 at 02:45:51AM -0700, ANBU J wrote: 
>>> > It's sad that we couldn't find a documentation for the methods for 
>>> table 
>>> > manipulation in tesseract. Looks like I have to manually implement an 
>>> algorithm 
>>> > to handle tables. 
>>> > if you have done it already, please share the knowledge.   
>>> > 
>>> > On Tuesday, 25 June 2013 14:42:46 UTC+5:30, [email protected] wrote: 
>>> > 
>>> >     Hi ! 
>>> > 
>>> >     I'm going to work for a program which can recognize the table 
>>> structure and 
>>> >     text in this table. 
>>> >     I tried to OCR the table image using command line on Windows 7, 
>>> but the 
>>> >     output text was so bad. 
>>> > 
>>> >     (just like this: tesseract table.jpg out -l eng, or with "hocr") 
>>> >     I tried to using TessBaseAPI in VC too.(just a simple application) 
>>> > 
>>> >     The table lines(especially column) interfere in the whole image. 
>>> > 
>>> >     And now, I find the Class "TableFinder" in Tesseract source code, 
>>> but I 
>>> >     can't get anything else from Internet. (Tesseract-OCR-3.02) 
>>> >     No demos, teachings here? 
>>> > 
>>> >     I am new, sincerely hope to get some help.  :) 
>>> > 
>>> >     Thanks! 
>>> > 
>>> > -- 
>>> > -- 
>>> > You received this message because you are subscribed to the Google 
>>> > Groups "tesseract-ocr" group. 
>>> > To post to this group, send email to [email protected] 
>>> > To unsubscribe from this group, send email to 
>>> > [email protected] 
>>> > For more options, visit this group at 
>>> > http://groups.google.com/group/tesseract-ocr?hl=en 
>>> > 
>>> > --- 
>>> > You received this message because you are subscribed to the Google 
>>> Groups 
>>> > "tesseract-ocr" group. 
>>> > To unsubscribe from this group and stop receiving emails from it, send 
>>> an email 
>>> > to [email protected]. 
>>> > For more options, visit https://groups.google.com/d/optout. 
>>>
>>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: More about CLASS "TableFinder" in tablefind.h

Reply via email to