Hi,  I'm not sure about the psm mode you have used. You can try psm 6 for
table.

Something like this..
pytesseract.image_to_string(image, lang='eng', config='--psm 6')

On Fri, May 31, 2019, 16:14 Sayali begampure <[email protected]>
wrote:

> Used psm for 2 column documents. Its showing results perfectly.
> Can you send link or pointers how to use it for table content extraction
> from scanned pdf?
>
> Thanks
>
> On Friday, 31 May 2019 16:00:36 UTC+5:30, Amulya Kali wrote:
>>
>> Did you try changing psm?
>>
>> On Fri, May 31, 2019, 15:57 Manasi sarode <[email protected]> wrote:
>>
>>> That's fair enough.
>>>
>>> On Fri, May 31, 2019, 3:55 PM Sayali begampure <[email protected]>
>>> wrote:
>>>
>>>> We are trying to extract text content from normal pdf and scanned pdf
>>>> (image) using tesseract-ocr.
>>>>
>>>> We have observed following issues for the pdf's with table as table
>>>> Contents are not getting extracted properly.
>>>>
>>>>    1. Contents from few cells(rows/columns) are not visible.Sometimes
>>>>    heading of the table is missing.
>>>>    2. If numbers are there inside table, all the numbers are not
>>>>    getting extracted.
>>>>    3. Some letters are extracted wrongly . eg. i is misinterpreted as
>>>>    l.
>>>>    4. Column sequence is getting interchanged as it is parsing
>>>>    horizontally.
>>>>    5. Some extra characters are getting extracted along with normal
>>>>    one.
>>>>
>>>> Tried image_to_string ,image_to_data ,opencv approach
>>>>
>>>> Sample code used is:
>>>>
>>>> from PIL import Image
>>>>
>>>> import pytesseract from pytesseract import image_to_string from
>>>> pytesseract import image_to_boxes
>>>>
>>>> image=(pytesseract.image_to_string(Image.open('table_number.jpg')))
>>>> print(image)
>>>>
>>>>
>>>> It should extract rows and columns properly which it is not extracting
>>>> as of now. Kindly suggest function or method to enhance the results for
>>>> table content extraction using tesseract.
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/6ddfa19c-8025-40f8-8f17-a393e5b5b2cc%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/6ddfa19c-8025-40f8-8f17-a393e5b5b2cc%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/CAJ7g%3DmSUBC4aK0L%3De-9bbtBX5%3DCiFF%3DkLW8Wcmvr4YjQG13pmQ%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJ7g%3DmSUBC4aK0L%3De-9bbtBX5%3DCiFF%3DkLW8Wcmvr4YjQG13pmQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/b9c108dc-141c-4eb1-8bea-654410e42e05%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/b9c108dc-141c-4eb1-8bea-654410e42e05%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAFJOL1X5fxZ4%2BMOAiXXkW-ALRPy4U%2BhXY1qqoDSZJD3Jf0eVYg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to