[tesseract-ocr] How to get "tables" ocr-ed

V S Rawat Sun, 10 Aug 2014 07:17:32 -0700

We often get text in which images or pdf have tables.

Text is in several columns, which should be treated separated and shouldbe put in the same line with some separator like tab and quotes to getcsv format.


However my method of tesseract at vietocr.Net doesn't help there.

It does recognizes separate areas, and ocrs them separately, but putsthat one column below the other, say, all rows of first column at top,then all rows of second column, then all rows of next column so on.

It is not much helpful because it takes lots of efforts to put all textof one row together.

Is there any method of making tesseract identify tables and do ocr insome helpful way?


or should this problem be addressed to frontend vietocr.Net developers?

Thanks.
--
Rawat




--
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/53E77ED9.5080101%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] How to get "tables" ocr-ed

Reply via email to