Thanks for the reply.

TSV is giving data in a column. So it covers column1 then column2 and 
finally column 3 one below the other.
I am not able to figure out how to construct a table from a TSV.

On Wednesday, July 26, 2017 at 11:26:18 PM UTC+5:30, shree wrote:
>
> Try  'tsv' instead of 'hocr'
>
> ShreeDevi
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Wed, Jul 26, 2017 at 10:30 PM, Prav <[email protected] <javascript:>> 
> wrote:
>
>> Hi,
>>
>> I am trying to extract tabular data. For this I am converting the image 
>> into hocr. 
>> Now this hocr is not coming properly. It first puts the data for one 
>> column and then for the other. I do not get data which is put row wise and 
>> column wise so that the extraction comes as a proper table.
>>
>> I have tried with -psm 5 and with -psm 6 but in both cases the hocr looks 
>> identical.
>>
>> I am using tesseract 3.05
>>
>> even preserve_interword_space set to 1 is not working.
>>
>> Any help would be useful
>>
>> For eg
>> we have the following in the image
>>
>> Colulmn 1             Column 2
>> X                           1
>> Y                           2
>> Z                           3
>>
>> hocr is giving
>>
>> X
>> Y
>> Z
>> 1
>> 2
>> 3
>>
>> I would like the output to be
>>
>> X     1
>> Y     2
>> Z     3
>>
>> Will be grateful for any help and/or ideas
>>
>> Thanks
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/d2b68f4a-8f1b-473b-bd27-818d9d1a28be%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/d2b68f4a-8f1b-473b-bd27-818d9d1a28be%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8d39ec96-fb90-4f31-b086-3e23a41e5f82%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to