Thanks for the suggestion. How it'll help me to recreate the table? We can't assume blank spaces as a mean to separate cells. I think, we have to add tags between cells while removing the lines. It would be better if we can add <tr>, <td> tags, but any special character to identify the cells would do. A lines removed image won't help us to identify or recreate the data as a table. Sorry if I'm being stupid, but I'm really new to this.
On Wednesday, April 9, 2014 9:57:21 AM UTC+5:30, temp name wrote: > > Try using "tessedit_dump_page_segment T" , config parameter, it dumps the > image after removing horizontal and vertical lines. You can use this image > further for OCR. > > On Wednesday, April 9, 2014 9:47:41 AM UTC+5:30, ANBU J wrote: >> >> Thanks for the reply Nick. I'm doing it. It is very hard ti figure out >>> the functionality of methods without understanding the whole project. Since >>> I have to find out what are those header files do and the relation, it is >>> going to take a lot of time. I'd appreciate if anyone can point me out >>> where the outputs (the extracted text from table) being passed. So that I >>> can add html table tags to the output to reproduce the table in html >>> format. >> >> Anbu >> >> On Tuesday, April 8, 2014 9:08:30 PM UTC+5:30, Nick White wrote: >>> >>> Documentation for the internals of Tesseract is unfortunately rather >>> minimal, indeed. I'd recommend you take a look at the TableFinder >>> class in the code to figure it out. And please do share anything you >>> learn here! >>> >>> Nick >>> >>> On Mon, Apr 07, 2014 at 02:45:51AM -0700, ANBU J wrote: >>> > It's sad that we couldn't find a documentation for the methods for >>> table >>> > manipulation in tesseract. Looks like I have to manually implement an >>> algorithm >>> > to handle tables. >>> > if you have done it already, please share the knowledge. >>> > >>> > On Tuesday, 25 June 2013 14:42:46 UTC+5:30, [email protected] wrote: >>> > >>> > Hi ! >>> > >>> > I'm going to work for a program which can recognize the table >>> structure and >>> > text in this table. >>> > I tried to OCR the table image using command line on Windows 7, >>> but the >>> > output text was so bad. >>> > >>> > (just like this: tesseract table.jpg out -l eng, or with "hocr") >>> > I tried to using TessBaseAPI in VC too.(just a simple application) >>> > >>> > The table lines(especially column) interfere in the whole image. >>> > >>> > And now, I find the Class "TableFinder" in Tesseract source code, >>> but I >>> > can't get anything else from Internet. (Tesseract-OCR-3.02) >>> > No demos, teachings here? >>> > >>> > I am new, sincerely hope to get some help. :) >>> > >>> > Thanks! >>> > >>> > -- >>> > -- >>> > You received this message because you are subscribed to the Google >>> > Groups "tesseract-ocr" group. >>> > To post to this group, send email to [email protected] >>> > To unsubscribe from this group, send email to >>> > [email protected] >>> > For more options, visit this group at >>> > http://groups.google.com/group/tesseract-ocr?hl=en >>> > >>> > --- >>> > You received this message because you are subscribed to the Google >>> Groups >>> > "tesseract-ocr" group. >>> > To unsubscribe from this group and stop receiving emails from it, send >>> an email >>> > to [email protected]. >>> > For more options, visit https://groups.google.com/d/optout. >>> >> -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.

