Hi Vicky, Can you tell me more about this paper? It looks like this is not a free document so I can't just read it to see if it would solve the problem I have.
My problem is that I have grey-scale image data (tif/jpg/etc) that contains text within a table format, i.e. cells on the page. The documents where originally faxed then converted to PDF so the image quality varies from poor to good. I don't want the table formatting, I'm looking for a way to remove the formatting and get to just the image text, I want to convert that to text using OCR, Tesseract or otherwise. My programming environment is Java but can shell out to other programs if I need to. Would the approach in the paper solve this problem space? How practical is the software solution for a one man effort? Thanks, -Dave On Sun, Mar 13, 2011 at 10:18 AM, Vicky Budhiraja <[email protected]> wrote: > Hello, > > I used this paper (for pre-processing): > Parameter-Free Geometric Document Layout Analysis, by Lee, Ryu 2001. IEEE > Tran. Patt. Analysis and Machine Int. Nov 2001 Volume 23 Issue 11 Pages 1240 > - 1256 > > Best Regards, > Vicky > > > > -----Original Message----- > From: [email protected] [mailto:[email protected]] > On Behalf Of Daphne > Sent: Friday, March 11, 2011 01:15 > To: tesseract-ocr > Subject: how to get the character in an image file which is in table format. > > Hello, > > I have a scanned image file which contains table. When I OCR it using > tessnet it doesn't give the desired output. > It is not reading the characters in the table. Instead it give some > numbers. > > How to read the character in table format image > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

