Yes, we are using the -psm 6 command line argument. And it was not working.
But I figured out the issue. Tesseract has a set of config files. Inside several of these config files (hocr, pdf, tsv, unlv) is the setting *tessedit_pageseg_mode*. This setting was set to 1 in all the config files. Once I removed the *tessedit_pageseg_mode* parameter from the config files, our command line argument of -psm 6 worked. Alternatively, I did experiment with the config files. When I changed the *tessedit_pageseg_mode *setting to 6 in all the config files and ran Tesseract with the -psm 6 command line argument, it also worked. Thanks On Thursday, April 6, 2017 at 1:12:18 PM UTC-5, shree wrote: > Have u tried --psm 6 > > - excuse the brevity, sent from mobile > > On 06-Apr-2017 11:06 PM, "Mike Hall" <[email protected] <javascript:>> > wrote: > >> We have a C# .Net app that is using Tesseract to do Optical Character >> Recognition (OCR) on .tiff files. I've attached a sample tiff file. >> >> We are then outputting the data to a text file. However, Tesseract is >> reading the data in a Vertical fashion. In my example image, it is reading >> the tiff as two columns of data and the data the data is being outputted >> from Tesseract like this: >> >> TYPE: >> DATE: >> Address: >> City: >> State: >> Owner: >> Owner Type: >> Acreage: >> Mortgage: >> 12345 >> 2017-04-06 >> 100 Main St. >> Some City >> Some State >> John Doe >> Primary >> 10.25 >> Yes >> >> What we want is Tesseract to read the tiff file horizontally and have the >> output look like this: >> >> TYPE: >> 12345 >> DATE: >> 2017-04-06 >> Address: >> 100 Main St. >> City: >> Some City >> State: >> Some State >> Owner: >> John Doe >> Owner Type: >> Primary >> Acreage: >> 10.25 >> Mortgage: >> Yes >> >> We've tried the various Page Sementation options for Tesseract, but they >> all produce the same result. >> Has anyone run into this same issue? Anybody have any ideas? >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/790b41ef-f97f-4695-b7c8-1c68bdd1cd38%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/790b41ef-f97f-4695-b7c8-1c68bdd1cd38%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/e56e8714-716a-4664-90c0-bb0f4217c46a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

