Try 'tsv' instead of 'hocr' ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Wed, Jul 26, 2017 at 10:30 PM, Prav <[email protected]> wrote: > Hi, > > I am trying to extract tabular data. For this I am converting the image > into hocr. > Now this hocr is not coming properly. It first puts the data for one > column and then for the other. I do not get data which is put row wise and > column wise so that the extraction comes as a proper table. > > I have tried with -psm 5 and with -psm 6 but in both cases the hocr looks > identical. > > I am using tesseract 3.05 > > even preserve_interword_space set to 1 is not working. > > Any help would be useful > > For eg > we have the following in the image > > Colulmn 1 Column 2 > X 1 > Y 2 > Z 3 > > hocr is giving > > X > Y > Z > 1 > 2 > 3 > > I would like the output to be > > X 1 > Y 2 > Z 3 > > Will be grateful for any help and/or ideas > > Thanks > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/d2b68f4a-8f1b-473b-bd27-818d9d1a28be% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/d2b68f4a-8f1b-473b-bd27-818d9d1a28be%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVktEq97gHgJ4vg%3DWVt%2BiUb1uEy5fhZ-4wkGVcTbXbN0w%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

