Try  'tsv' instead of 'hocr'

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Wed, Jul 26, 2017 at 10:30 PM, Prav <[email protected]> wrote:

> Hi,
>
> I am trying to extract tabular data. For this I am converting the image
> into hocr.
> Now this hocr is not coming properly. It first puts the data for one
> column and then for the other. I do not get data which is put row wise and
> column wise so that the extraction comes as a proper table.
>
> I have tried with -psm 5 and with -psm 6 but in both cases the hocr looks
> identical.
>
> I am using tesseract 3.05
>
> even preserve_interword_space set to 1 is not working.
>
> Any help would be useful
>
> For eg
> we have the following in the image
>
> Colulmn 1             Column 2
> X                           1
> Y                           2
> Z                           3
>
> hocr is giving
>
> X
> Y
> Z
> 1
> 2
> 3
>
> I would like the output to be
>
> X     1
> Y     2
> Z     3
>
> Will be grateful for any help and/or ideas
>
> Thanks
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/d2b68f4a-8f1b-473b-bd27-818d9d1a28be%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/d2b68f4a-8f1b-473b-bd27-818d9d1a28be%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVktEq97gHgJ4vg%3DWVt%2BiUb1uEy5fhZ-4wkGVcTbXbN0w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to