Since the English language does not have a border, I don't think tesseract will know how to handle it. You'll need to tell it how to add the borders to everything.
I had a similar problem, except I was pulling from a much larger complex table, I was using | as the border, and it kept adding a capital i for everything. On Saturday, November 17, 2018 at 1:39:50 AM UTC-8, Soumen Seth wrote: > > Hi Everyone, > > I am working on *python 2.7* and *pytesseract*. My tasseract version - > > tesseract 4.0.0-beta.1 > leptonica-1.75.3 > libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : > libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 > > I am trying to extract text from a table with tesseract. But I am unable > to extract the texts properly. > > I tried to extract texts from this table: > > [image: sample 2.png] > and this is what I got: > > Time Table\n| Mon | Tue | Wed | Thu | Fri\n| Science | Maths [Science | Maths > | arts\nours S02! [History | English | Social | Sports\n\n \n\n \n\n \n\n > \n\nHe > > > As you can see, this is far from satisfactory. *Can anyone please tell me how > to do it?* > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/6e467097-1707-47d4-a5ab-68c90df255fd%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.