Thanks..I will try with this.

On Friday, 31 May 2019 19:41:50 UTC+5:30, Amulya Kali wrote:
>
> Hi,  I'm not sure about the psm mode you have used. You can try psm 6 for 
> table. 
>
> Something like this.. 
> pytesseract.image_to_string(image, lang='eng', config='--psm 6')
>
> On Fri, May 31, 2019, 16:14 Sayali begampure <[email protected] 
> <javascript:>> wrote:
>
>> Used psm for 2 column documents. Its showing results perfectly.
>> Can you send link or pointers how to use it for table content extraction 
>> from scanned pdf?
>>
>> Thanks
>>
>> On Friday, 31 May 2019 16:00:36 UTC+5:30, Amulya Kali wrote:
>>>
>>> Did you try changing psm?
>>>
>>> On Fri, May 31, 2019, 15:57 Manasi sarode <[email protected]> wrote:
>>>
>>>> That's fair enough.
>>>>
>>>> On Fri, May 31, 2019, 3:55 PM Sayali begampure <[email protected]> 
>>>> wrote:
>>>>
>>>>> We are trying to extract text content from normal pdf and scanned pdf 
>>>>> (image) using tesseract-ocr.
>>>>>
>>>>> We have observed following issues for the pdf's with table as table 
>>>>> Contents are not getting extracted properly.
>>>>>
>>>>>    1. Contents from few cells(rows/columns) are not visible.Sometimes 
>>>>>    heading of the table is missing.
>>>>>    2. If numbers are there inside table, all the numbers are not 
>>>>>    getting extracted.
>>>>>    3. Some letters are extracted wrongly . eg. i is misinterpreted as 
>>>>>    l.
>>>>>    4. Column sequence is getting interchanged as it is parsing 
>>>>>    horizontally.
>>>>>    5. Some extra characters are getting extracted along with normal 
>>>>>    one.
>>>>>
>>>>> Tried image_to_string ,image_to_data ,opencv approach
>>>>>
>>>>> Sample code used is:
>>>>>
>>>>> from PIL import Image
>>>>>
>>>>> import pytesseract from pytesseract import image_to_string from 
>>>>> pytesseract import image_to_boxes
>>>>>
>>>>> image=(pytesseract.image_to_string(Image.open('table_number.jpg'))) 
>>>>> print(image)
>>>>>
>>>>>
>>>>> It should extract rows and columns properly which it is not extracting 
>>>>> as of now. Kindly suggest function or method to enhance the results for 
>>>>> table content extraction using tesseract.
>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/6ddfa19c-8025-40f8-8f17-a393e5b5b2cc%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/6ddfa19c-8025-40f8-8f17-a393e5b5b2cc%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/CAJ7g%3DmSUBC4aK0L%3De-9bbtBX5%3DCiFF%3DkLW8Wcmvr4YjQG13pmQ%40mail.gmail.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJ7g%3DmSUBC4aK0L%3De-9bbtBX5%3DCiFF%3DkLW8Wcmvr4YjQG13pmQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/b9c108dc-141c-4eb1-8bea-654410e42e05%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/b9c108dc-141c-4eb1-8bea-654410e42e05%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/24b890aa-3b48-4c0a-ba2d-bae6c9ab47d6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to