We often get text in which images or pdf have tables.
Text is in several columns, which should be treated separated and should
be put in the same line with some separator like tab and quotes to get
csv format.
However my method of tesseract at vietocr.Net doesn't help there.
It does recognizes separate areas, and ocrs them separately, but puts
that one column below the other, say, all rows of first column at top,
then all rows of second column, then all rows of next column so on.
It is not much helpful because it takes lots of efforts to put all text
of one row together.
Is there any method of making tesseract identify tables and do ocr in
some helpful way?
or should this problem be addressed to frontend vietocr.Net developers?
Thanks.
--
Rawat
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/53E77ED9.5080101%40gmail.com.
For more options, visit https://groups.google.com/d/optout.