We are using hocr and pdf outputs as well. On Thursday, April 6, 2017 at 8:06:27 PM UTC-5, shree wrote: > > Normally, for text output, the other config files should not impact. > > > > - excuse the brevity, sent from mobile > > On 07-Apr-2017 2:18 AM, "Mike Hall" <[email protected] <javascript:>> > wrote: > >> Yes, we are using the -psm 6 command line argument. And it was not >> working. >> >> But I figured out the issue. >> >> Tesseract has a set of config files. Inside several of these config files >> (hocr, pdf, tsv, unlv) is the setting *tessedit_pageseg_mode*. This >> setting was set to 1 in all the config files. Once I removed the >> *tessedit_pageseg_mode* parameter from the config files, our command >> line argument of -psm 6 worked. >> >> Alternatively, I did experiment with the config files. When I changed >> the *tessedit_pageseg_mode *setting to 6 in all the config files and ran >> Tesseract with the -psm 6 command line argument, it also worked. >> >> Thanks >> >> On Thursday, April 6, 2017 at 1:12:18 PM UTC-5, shree wrote: >> >>> Have u tried --psm 6 >>> >>> - excuse the brevity, sent from mobile >>> >>> On 06-Apr-2017 11:06 PM, "Mike Hall" <[email protected]> wrote: >>> >>>> We have a C# .Net app that is using Tesseract to do Optical Character >>>> Recognition (OCR) on .tiff files. I've attached a sample tiff file. >>>> >>>> We are then outputting the data to a text file. However, Tesseract is >>>> reading the data in a Vertical fashion. In my example image, it is >>>> reading >>>> the tiff as two columns of data and the data the data is being outputted >>>> from Tesseract like this: >>>> >>>> TYPE: >>>> DATE: >>>> Address: >>>> City: >>>> State: >>>> Owner: >>>> Owner Type: >>>> Acreage: >>>> Mortgage: >>>> 12345 >>>> 2017-04-06 >>>> 100 Main St. >>>> Some City >>>> Some State >>>> John Doe >>>> Primary >>>> 10.25 >>>> Yes >>>> >>>> What we want is Tesseract to read the tiff file horizontally and have >>>> the output look like this: >>>> >>>> TYPE: >>>> 12345 >>>> DATE: >>>> 2017-04-06 >>>> Address: >>>> 100 Main St. >>>> City: >>>> Some City >>>> State: >>>> Some State >>>> Owner: >>>> John Doe >>>> Owner Type: >>>> Primary >>>> Acreage: >>>> 10.25 >>>> Mortgage: >>>> Yes >>>> >>>> We've tried the various Page Sementation options for Tesseract, but >>>> they all produce the same result. >>>> Has anyone run into this same issue? Anybody have any ideas? >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/790b41ef-f97f-4695-b7c8-1c68bdd1cd38%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/790b41ef-f97f-4695-b7c8-1c68bdd1cd38%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/e56e8714-716a-4664-90c0-bb0f4217c46a%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/e56e8714-716a-4664-90c0-bb0f4217c46a%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> >
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ef09e286-69ed-49fa-91a3-fdc7a74294a5%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

