Thanks..I will try with this. On Friday, 31 May 2019 19:41:50 UTC+5:30, Amulya Kali wrote: > > Hi, I'm not sure about the psm mode you have used. You can try psm 6 for > table. > > Something like this.. > pytesseract.image_to_string(image, lang='eng', config='--psm 6') > > On Fri, May 31, 2019, 16:14 Sayali begampure <[email protected] > <javascript:>> wrote: > >> Used psm for 2 column documents. Its showing results perfectly. >> Can you send link or pointers how to use it for table content extraction >> from scanned pdf? >> >> Thanks >> >> On Friday, 31 May 2019 16:00:36 UTC+5:30, Amulya Kali wrote: >>> >>> Did you try changing psm? >>> >>> On Fri, May 31, 2019, 15:57 Manasi sarode <[email protected]> wrote: >>> >>>> That's fair enough. >>>> >>>> On Fri, May 31, 2019, 3:55 PM Sayali begampure <[email protected]> >>>> wrote: >>>> >>>>> We are trying to extract text content from normal pdf and scanned pdf >>>>> (image) using tesseract-ocr. >>>>> >>>>> We have observed following issues for the pdf's with table as table >>>>> Contents are not getting extracted properly. >>>>> >>>>> 1. Contents from few cells(rows/columns) are not visible.Sometimes >>>>> heading of the table is missing. >>>>> 2. If numbers are there inside table, all the numbers are not >>>>> getting extracted. >>>>> 3. Some letters are extracted wrongly . eg. i is misinterpreted as >>>>> l. >>>>> 4. Column sequence is getting interchanged as it is parsing >>>>> horizontally. >>>>> 5. Some extra characters are getting extracted along with normal >>>>> one. >>>>> >>>>> Tried image_to_string ,image_to_data ,opencv approach >>>>> >>>>> Sample code used is: >>>>> >>>>> from PIL import Image >>>>> >>>>> import pytesseract from pytesseract import image_to_string from >>>>> pytesseract import image_to_boxes >>>>> >>>>> image=(pytesseract.image_to_string(Image.open('table_number.jpg'))) >>>>> print(image) >>>>> >>>>> >>>>> It should extract rows and columns properly which it is not extracting >>>>> as of now. Kindly suggest function or method to enhance the results for >>>>> table content extraction using tesseract. >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/6ddfa19c-8025-40f8-8f17-a393e5b5b2cc%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/6ddfa19c-8025-40f8-8f17-a393e5b5b2cc%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/CAJ7g%3DmSUBC4aK0L%3De-9bbtBX5%3DCiFF%3DkLW8Wcmvr4YjQG13pmQ%40mail.gmail.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJ7g%3DmSUBC4aK0L%3De-9bbtBX5%3DCiFF%3DkLW8Wcmvr4YjQG13pmQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/b9c108dc-141c-4eb1-8bea-654410e42e05%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/b9c108dc-141c-4eb1-8bea-654410e42e05%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> >
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/24b890aa-3b48-4c0a-ba2d-bae6c9ab47d6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

