Thanks for the reply. TSV is giving data in a column. So it covers column1 then column2 and finally column 3 one below the other. I am not able to figure out how to construct a table from a TSV.
On Wednesday, July 26, 2017 at 11:26:18 PM UTC+5:30, shree wrote: > > Try 'tsv' instead of 'hocr' > > ShreeDevi > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Wed, Jul 26, 2017 at 10:30 PM, Prav <[email protected] <javascript:>> > wrote: > >> Hi, >> >> I am trying to extract tabular data. For this I am converting the image >> into hocr. >> Now this hocr is not coming properly. It first puts the data for one >> column and then for the other. I do not get data which is put row wise and >> column wise so that the extraction comes as a proper table. >> >> I have tried with -psm 5 and with -psm 6 but in both cases the hocr looks >> identical. >> >> I am using tesseract 3.05 >> >> even preserve_interword_space set to 1 is not working. >> >> Any help would be useful >> >> For eg >> we have the following in the image >> >> Colulmn 1 Column 2 >> X 1 >> Y 2 >> Z 3 >> >> hocr is giving >> >> X >> Y >> Z >> 1 >> 2 >> 3 >> >> I would like the output to be >> >> X 1 >> Y 2 >> Z 3 >> >> Will be grateful for any help and/or ideas >> >> Thanks >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/d2b68f4a-8f1b-473b-bd27-818d9d1a28be%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/d2b68f4a-8f1b-473b-bd27-818d9d1a28be%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/8d39ec96-fb90-4f31-b086-3e23a41e5f82%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

