Hi, I am trying to extract tabular data. For this I am converting the image into hocr. Now this hocr is not coming properly. It first puts the data for one column and then for the other. I do not get data which is put row wise and column wise so that the extraction comes as a proper table.
I have tried with -psm 5 and with -psm 6 but in both cases the hocr looks identical. I am using tesseract 3.05 even preserve_interword_space set to 1 is not working. Any help would be useful For eg we have the following in the image Colulmn 1 Column 2 X 1 Y 2 Z 3 hocr is giving X Y Z 1 2 3 I would like the output to be X 1 Y 2 Z 3 Will be grateful for any help and/or ideas Thanks -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d2b68f4a-8f1b-473b-bd27-818d9d1a28be%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

