[tesseract-ocr] Re: Text Extraction from complex Table

DreadStarX Tue, 20 Nov 2018 08:38:30 -0800

Since the English language does not have a border, I don't think tesseract 
will know how to handle it. You'll need to tell it how to add the borders 
to everything.


I had a similar problem, except I was pulling from a much larger complex 
table, I was using | as the border, and it kept adding a capital i for 
everything. 

On Saturday, November 17, 2018 at 1:39:50 AM UTC-8, Soumen Seth wrote:
>
> Hi Everyone,
>
> I am working on *python 2.7* and *pytesseract*. My tasseract version - 
>
> tesseract 4.0.0-beta.1
>  leptonica-1.75.3
>   libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : 
> libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
>
> I am trying to extract text from a table with tesseract. But I am unable 
> to extract the texts properly. 
>
> I tried to extract texts from this table:
>
> [image: sample 2.png]
> and this is what I got:
>
> Time Table\n| Mon | Tue | Wed | Thu | Fri\n| Science | Maths [Science | Maths 
> | arts\nours S02! [History | English | Social | Sports\n\n \n\n \n\n \n\n 
> \n\nHe
>
>
> As you can see, this is far from satisfactory. *Can anyone please tell me how 
> to do it?*
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/6e467097-1707-47d4-a5ab-68c90df255fd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: Text Extraction from complex Table

Reply via email to