Re: how to get the character in an image file which is in table format.

David Hoffer Fri, 11 Mar 2011 22:12:06 -0800

Dmitry,

Yeah, I was thinking too of preprocessing to remove all straight
lines/borders but haven't found a good approach to this yet.  I can
clean up the margins, headers, footers but I haven't found a good way
to remove table row lines.  if you/others have any suggestions I would
love to hear them.


I will also experiment with the config file.

Thanks much!
-Dave

On Sat, Mar 12, 2011 at 7:24 AM, Dmitry Silaev <[email protected]> wrote:
> Actually I think there's no fully user-friendly solution. Maybe you
> can try to use the first of the two possible methods currently seen to
> me.
>
> So the first method is to devise a special config file and include it
> in the command line for Tesseract. The following values need to be
> within this config file:
>
> tessedit_pageseg_mode 1 or 3 (I recommend 3)
> textord_tabfind_find_tables T
> textord_tablefind_recognize_tables T
>
> You can play with the last param trying the T or F values. Actually I
> give no guarantee for the whole method to work, only I found out some
> clues by studying the code. I suspect corresponding pieces of code may
> not work perfectly, or there are some more parameters that can
> influence table recognition. Please try this yourself. It would be
> nice if you share your results with the community. Sample images are
> also appreciated.
>
> The second method is to pre-process your images. You need to remove
> lines and borders and pass the cleaned image to Tesseract. There can
> arise many issues related to this process, but I think there's no need
> to tell anything else now, except if you express some interest in it.
>
> Warm regards,
> Dmitry Silaev
>
>
>
>
>
> On Fri, Mar 11, 2011 at 7:21 AM, David Hoffer <[email protected]> wrote:
>> I have the same problem, I posted a message a few day's ago titled
>> "Working with FAX images with lines/borders".  If you find a solution
>> please let me know.
>>
>> Thanks,
>> -Dave
>>
>> On Thu, Mar 10, 2011 at 10:44 PM, Daphne <[email protected]> wrote:
>>> Hello,
>>>
>>> I have a scanned image file which contains table. When I OCR it using
>>> tessnet it doesn't give the desired output.
>>> It is not reading the characters in the table. Instead it give some
>>> numbers.
>>>
>>> How to read the character in table format image
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups 
>>> "tesseract-ocr" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to 
>>> [email protected].
>>> For more options, visit this group at 
>>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>>
>>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to 
>> [email protected].
>> For more options, visit this group at 
>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups 
> "tesseract-ocr" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Re: how to get the character in an image file which is in table format.

Reply via email to