I suspect, this paper is a sledgehammer for a nut. It's quite universal and elaborated. Usually it may take a great deal of time to implement and debug it. Your images might require much simplier methods.
I always say the same thing: send your sample images and the community will try to help. Warm regards, Dmitry Silaev On Mon, Mar 14, 2011 at 8:23 AM, David Hoffer <dhoff...@gmail.com> wrote: > Hi Vicky, > > Can you tell me more about this paper? It looks like this is not a > free document so I can't just read it to see if it would solve the > problem I have. > > My problem is that I have grey-scale image data (tif/jpg/etc) that > contains text within a table format, i.e. cells on the page. The > documents where originally faxed then converted to PDF so the image > quality varies from poor to good. I don't want the table formatting, > I'm looking for a way to remove the formatting and get to just the > image text, I want to convert that to text using OCR, Tesseract or > otherwise. > > My programming environment is Java but can shell out to other programs > if I need to. > > Would the approach in the paper solve this problem space? How > practical is the software solution for a one man effort? > > Thanks, > -Dave > > > > On Sun, Mar 13, 2011 at 10:18 AM, Vicky Budhiraja <vicky.vi...@gmail.com> > wrote: >> Hello, >> >> I used this paper (for pre-processing): >> Parameter-Free Geometric Document Layout Analysis, by Lee, Ryu 2001. IEEE >> Tran. Patt. Analysis and Machine Int. Nov 2001 Volume 23 Issue 11 Pages 1240 >> - 1256 >> >> Best Regards, >> Vicky >> >> >> >> -----Original Message----- >> From: tesseract-ocr@googlegroups.com [mailto:tesseract-ocr@googlegroups.com] >> On Behalf Of Daphne >> Sent: Friday, March 11, 2011 01:15 >> To: tesseract-ocr >> Subject: how to get the character in an image file which is in table format. >> >> Hello, >> >> I have a scanned image file which contains table. When I OCR it using >> tessnet it doesn't give the desired output. >> It is not reading the characters in the table. Instead it give some >> numbers. >> >> How to read the character in table format image >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To post to this group, send email to tesseract-ocr@googlegroups.com. >> To unsubscribe from this group, send email to >> tesseract-ocr+unsubscr...@googlegroups.com. >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To post to this group, send email to tesseract-ocr@googlegroups.com. >> To unsubscribe from this group, send email to >> tesseract-ocr+unsubscr...@googlegroups.com. >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en. >> >> > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to tesseract-ocr@googlegroups.com. > To unsubscribe from this group, send email to > tesseract-ocr+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.