Actually I think there's no fully user-friendly solution. Maybe you
can try to use the first of the two possible methods currently seen to
me.

So the first method is to devise a special config file and include it
in the command line for Tesseract. The following values need to be
within this config file:

tessedit_pageseg_mode 1 or 3 (I recommend 3)
textord_tabfind_find_tables T
textord_tablefind_recognize_tables T

You can play with the last param trying the T or F values. Actually I
give no guarantee for the whole method to work, only I found out some
clues by studying the code. I suspect corresponding pieces of code may
not work perfectly, or there are some more parameters that can
influence table recognition. Please try this yourself. It would be
nice if you share your results with the community. Sample images are
also appreciated.

The second method is to pre-process your images. You need to remove
lines and borders and pass the cleaned image to Tesseract. There can
arise many issues related to this process, but I think there's no need
to tell anything else now, except if you express some interest in it.

Warm regards,
Dmitry Silaev





On Fri, Mar 11, 2011 at 7:21 AM, David Hoffer <[email protected]> wrote:
> I have the same problem, I posted a message a few day's ago titled
> "Working with FAX images with lines/borders".  If you find a solution
> please let me know.
>
> Thanks,
> -Dave
>
> On Thu, Mar 10, 2011 at 10:44 PM, Daphne <[email protected]> wrote:
>> Hello,
>>
>> I have a scanned image file which contains table. When I OCR it using
>> tessnet it doesn't give the desired output.
>> It is not reading the characters in the table. Instead it give some
>> numbers.
>>
>> How to read the character in table format image
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to 
>> [email protected].
>> For more options, visit this group at 
>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups 
> "tesseract-ocr" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to