Dave,

Yep, quality is relatively poor so don't expect high accuracy from Tess.

Do you need every table cell's contents? Or getting numbers is just
enough and in a next step you can restore [predefined] item names?

Warm regards,
Dmitry Silaev





On Mon, Mar 14, 2011 at 4:19 PM, David Hoffer <dhoff...@gmail.com> wrote:
> Dmity,
>
> That would be great thanks for the offer, I'll attach two samples.
>
> These two are good examples of the range of quality.  What I need to
> do is extract cell data for processing.  I can generate these in any
> image format, tiff, jpeg if one should be preferred.
>
> Best regards,
> -Dave
>
>
> On Mon, Mar 14, 2011 at 11:07 AM, Dmitry Silaev <daemons2...@gmail.com> wrote:
>> I suspect, this paper is a sledgehammer for a nut. It's quite
>> universal and elaborated. Usually it may take a great deal of time to
>> implement and debug it. Your images might require much simplier
>> methods.
>>
>> I always say the same thing: send your sample images and the community
>> will try to help.
>>
>> Warm regards,
>> Dmitry Silaev
>>
>>
>>
>>
>>
>> On Mon, Mar 14, 2011 at 8:23 AM, David Hoffer <dhoff...@gmail.com> wrote:
>>> Hi Vicky,
>>>
>>> Can you tell me more about this paper?  It looks like this is not a
>>> free document so I can't just read it to see if it would solve the
>>> problem I have.
>>>
>>> My problem is that I have grey-scale image data (tif/jpg/etc) that
>>> contains text within a table format, i.e. cells on the page.  The
>>> documents where originally faxed then converted to PDF so the image
>>> quality varies from poor to good.  I don't want the table formatting,
>>> I'm looking for a way to remove the formatting and get to just the
>>> image text, I want to convert that to text using OCR, Tesseract or
>>> otherwise.
>>>
>>> My programming environment is Java but can shell out to other programs
>>> if I need to.
>>>
>>> Would the approach in the paper solve this problem space?  How
>>> practical is the software solution for a one man effort?
>>>
>>> Thanks,
>>> -Dave
>>>
>>>
>>>
>>> On Sun, Mar 13, 2011 at 10:18 AM, Vicky Budhiraja <vicky.vi...@gmail.com> 
>>> wrote:
>>>> Hello,
>>>>
>>>> I used this paper (for pre-processing):
>>>> Parameter-Free Geometric Document Layout Analysis, by Lee, Ryu 2001. IEEE
>>>> Tran. Patt. Analysis and Machine Int. Nov 2001 Volume 23 Issue 11 Pages 
>>>> 1240
>>>> - 1256
>>>>
>>>> Best Regards,
>>>> Vicky
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: tesseract-ocr@googlegroups.com 
>>>> [mailto:tesseract-ocr@googlegroups.com]
>>>> On Behalf Of Daphne
>>>> Sent: Friday, March 11, 2011 01:15
>>>> To: tesseract-ocr
>>>> Subject: how to get the character in an image file which is in table 
>>>> format.
>>>>
>>>> Hello,
>>>>
>>>> I have a scanned image file which contains table. When I OCR it using
>>>> tessnet it doesn't give the desired output.
>>>> It is not reading the characters in the table. Instead it give some
>>>> numbers.
>>>>
>>>> How to read the character in table format image
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google Groups
>>>> "tesseract-ocr" group.
>>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>>> To unsubscribe from this group, send email to
>>>> tesseract-ocr+unsubscr...@googlegroups.com.
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "tesseract-ocr" group.
>>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>>> To unsubscribe from this group, send email to 
>>>> tesseract-ocr+unsubscr...@googlegroups.com.
>>>> For more options, visit this group at 
>>>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>>>
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups 
>>> "tesseract-ocr" group.
>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>> To unsubscribe from this group, send email to 
>>> tesseract-ocr+unsubscr...@googlegroups.com.
>>> For more options, visit this group at 
>>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>>
>>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to