[tesseract-ocr] Re: How to get "tables" ocr-ed

Quan Nguyen Sun, 10 Aug 2014 07:27:06 -0700

Table is a known limitation of Tesseract OCR engine.

If you know how to eliminate the table borders, you would get better 
results from Tesseract.


On Sunday, August 10, 2014 9:17:07 AM UTC-5, V S Rawat wrote:
>
> We often get text in which images or pdf have tables. 
>
> Text is in several columns, which should be treated separated and should 
> be put in the same line with some separator like tab and quotes to get 
> csv format. 
>
> However my method of tesseract at vietocr.Net doesn't help there. 
>
> It does recognizes separate areas, and ocrs them separately, but puts 
> that one column below the other, say, all rows of first column at top, 
> then all rows of second column, then all rows of next column so on. 
>
> It is not much helpful because it takes lots of efforts to put all text 
> of one row together. 
>
> Is there any method of making tesseract identify tables and do ocr in 
> some helpful way? 
>
> or should this problem be addressed to frontend vietocr.Net developers? 
>
> Thanks. 
> -- 
> Rawat 
>
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/3f9cbc6d-9aff-4530-8c96-b77f7454ded7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: How to get "tables" ocr-ed

Reply via email to