We are using hocr and pdf outputs as well.

On Thursday, April 6, 2017 at 8:06:27 PM UTC-5, shree wrote:
>
> Normally, for text output, the other config files should not impact.
>
>
>
> - excuse the brevity, sent from mobile
>
> On 07-Apr-2017 2:18 AM, "Mike Hall" <[email protected] <javascript:>> 
> wrote:
>
>> Yes, we are using the -psm 6 command line argument.  And it was not 
>> working.
>>
>> But I figured out the issue.  
>>
>> Tesseract has a set of config files. Inside several of these config files 
>> (hocr, pdf, tsv, unlv) is the setting *tessedit_pageseg_mode*. This 
>> setting was set to 1 in all the config files.   Once I removed the 
>> *tessedit_pageseg_mode* parameter from the config files, our command 
>> line argument of -psm 6 worked.
>>
>> Alternatively, I did experiment with the config files.  When I changed 
>> the *tessedit_pageseg_mode *setting to 6 in all the config files and ran 
>> Tesseract with the -psm 6 command line argument, it also worked.
>>
>> Thanks
>>
>> On Thursday, April 6, 2017 at 1:12:18 PM UTC-5, shree wrote:
>>
>>> Have u tried --psm 6
>>>
>>> - excuse the brevity, sent from mobile
>>>
>>> On 06-Apr-2017 11:06 PM, "Mike Hall" <[email protected]> wrote:
>>>
>>>> We have a C# .Net app that is using Tesseract to do Optical Character 
>>>> Recognition (OCR) on .tiff files.  I've attached a sample tiff file.
>>>>
>>>> We are then outputting the data to a text file.  However, Tesseract is 
>>>> reading the data in a Vertical fashion.  In my example image, it is 
>>>> reading 
>>>> the tiff as two columns of data and the data the data is being outputted 
>>>> from Tesseract like this:
>>>>  
>>>> TYPE:
>>>> DATE:
>>>> Address:
>>>> City:
>>>> State:
>>>> Owner:
>>>> Owner Type:
>>>> Acreage:
>>>> Mortgage: 
>>>> 12345 
>>>> 2017-04-06 
>>>> 100 Main St.
>>>> Some City 
>>>> Some State 
>>>> John Doe 
>>>> Primary 
>>>> 10.25 
>>>> Yes
>>>>
>>>> What we want is Tesseract to read the tiff file horizontally and have 
>>>> the output look like this:
>>>>
>>>> TYPE:
>>>> 12345 
>>>> DATE:
>>>> 2017-04-06 
>>>> Address:
>>>> 100 Main St. 
>>>> City:
>>>> Some City 
>>>> State:
>>>> Some State 
>>>> Owner:
>>>> John Doe 
>>>> Owner Type:
>>>> Primary 
>>>> Acreage:
>>>> 10.25
>>>> Mortgage:
>>>> Yes
>>>>
>>>> We've tried the various Page Sementation options for Tesseract, but 
>>>> they all produce the same result.
>>>> Has anyone run into this same issue? Anybody have any ideas?
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/790b41ef-f97f-4695-b7c8-1c68bdd1cd38%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/790b41ef-f97f-4695-b7c8-1c68bdd1cd38%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/e56e8714-716a-4664-90c0-bb0f4217c46a%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/e56e8714-716a-4664-90c0-bb0f4217c46a%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/ef09e286-69ed-49fa-91a3-fdc7a74294a5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to