Hi Vicky,

Can you tell me more about this paper?  It looks like this is not a
free document so I can't just read it to see if it would solve the
problem I have.

My problem is that I have grey-scale image data (tif/jpg/etc) that
contains text within a table format, i.e. cells on the page.  The
documents where originally faxed then converted to PDF so the image
quality varies from poor to good.  I don't want the table formatting,
I'm looking for a way to remove the formatting and get to just the
image text, I want to convert that to text using OCR, Tesseract or
otherwise.

My programming environment is Java but can shell out to other programs
if I need to.

Would the approach in the paper solve this problem space?  How
practical is the software solution for a one man effort?

Thanks,
-Dave



On Sun, Mar 13, 2011 at 10:18 AM, Vicky Budhiraja <[email protected]> wrote:
> Hello,
>
> I used this paper (for pre-processing):
> Parameter-Free Geometric Document Layout Analysis, by Lee, Ryu 2001. IEEE
> Tran. Patt. Analysis and Machine Int. Nov 2001 Volume 23 Issue 11 Pages 1240
> - 1256
>
> Best Regards,
> Vicky
>
>
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> On Behalf Of Daphne
> Sent: Friday, March 11, 2011 01:15
> To: tesseract-ocr
> Subject: how to get the character in an image file which is in table format.
>
> Hello,
>
> I have a scanned image file which contains table. When I OCR it using
> tessnet it doesn't give the desired output.
> It is not reading the characters in the table. Instead it give some
> numbers.
>
> How to read the character in table format image
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "tesseract-ocr" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to