Table is a known limitation of Tesseract OCR engine. If you know how to eliminate the table borders, you would get better results from Tesseract.
On Sunday, August 10, 2014 9:17:07 AM UTC-5, V S Rawat wrote: > > We often get text in which images or pdf have tables. > > Text is in several columns, which should be treated separated and should > be put in the same line with some separator like tab and quotes to get > csv format. > > However my method of tesseract at vietocr.Net doesn't help there. > > It does recognizes separate areas, and ocrs them separately, but puts > that one column below the other, say, all rows of first column at top, > then all rows of second column, then all rows of next column so on. > > It is not much helpful because it takes lots of efforts to put all text > of one row together. > > Is there any method of making tesseract identify tables and do ocr in > some helpful way? > > or should this problem be addressed to frontend vietocr.Net developers? > > Thanks. > -- > Rawat > > > > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3f9cbc6d-9aff-4530-8c96-b77f7454ded7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

