Yes, we are using the -psm 6 command line argument.  And it was not working.

But I figured out the issue.  

Tesseract has a set of config files. Inside several of these config files 
(hocr, pdf, tsv, unlv) is the setting *tessedit_pageseg_mode*. This setting 
was set to 1 in all the config files.   Once I removed the 
*tessedit_pageseg_mode* parameter from the config files, our command line 
argument of -psm 6 worked.

Alternatively, I did experiment with the config files.  When I changed the 
*tessedit_pageseg_mode 
*setting to 6 in all the config files and ran Tesseract with the -psm 6 
command line argument, it also worked.

Thanks

On Thursday, April 6, 2017 at 1:12:18 PM UTC-5, shree wrote:

> Have u tried --psm 6
>
> - excuse the brevity, sent from mobile
>
> On 06-Apr-2017 11:06 PM, "Mike Hall" <[email protected] <javascript:>> 
> wrote:
>
>> We have a C# .Net app that is using Tesseract to do Optical Character 
>> Recognition (OCR) on .tiff files.  I've attached a sample tiff file.
>>
>> We are then outputting the data to a text file.  However, Tesseract is 
>> reading the data in a Vertical fashion.  In my example image, it is reading 
>> the tiff as two columns of data and the data the data is being outputted 
>> from Tesseract like this:
>>  
>> TYPE:
>> DATE:
>> Address:
>> City:
>> State:
>> Owner:
>> Owner Type:
>> Acreage:
>> Mortgage: 
>> 12345 
>> 2017-04-06 
>> 100 Main St.
>> Some City 
>> Some State 
>> John Doe 
>> Primary 
>> 10.25 
>> Yes
>>
>> What we want is Tesseract to read the tiff file horizontally and have the 
>> output look like this:
>>
>> TYPE:
>> 12345 
>> DATE:
>> 2017-04-06 
>> Address:
>> 100 Main St. 
>> City:
>> Some City 
>> State:
>> Some State 
>> Owner:
>> John Doe 
>> Owner Type:
>> Primary 
>> Acreage:
>> 10.25
>> Mortgage:
>> Yes
>>
>> We've tried the various Page Sementation options for Tesseract, but they 
>> all produce the same result.
>> Has anyone run into this same issue? Anybody have any ideas?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/790b41ef-f97f-4695-b7c8-1c68bdd1cd38%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/790b41ef-f97f-4695-b7c8-1c68bdd1cd38%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/e56e8714-716a-4664-90c0-bb0f4217c46a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to