We have a C# .Net app that is using Tesseract to do Optical Character Recognition (OCR) on .tiff files. I've attached a sample tiff file.
We are then outputting the data to a text file. However, Tesseract is reading the data in a Vertical fashion. In my example image, it is reading the tiff as two columns of data and the data the data is being outputted from Tesseract like this: TYPE: DATE: Address: City: State: Owner: Owner Type: Acreage: Mortgage: 12345 2017-04-06 100 Main St. Some City Some State John Doe Primary 10.25 Yes What we want is Tesseract to read the tiff file horizontally and have the output look like this: TYPE: 12345 DATE: 2017-04-06 Address: 100 Main St. City: Some City State: Some State Owner: John Doe Owner Type: Primary Acreage: 10.25 Mortgage: Yes We've tried the various Page Sementation options for Tesseract, but they all produce the same result. Has anyone run into this same issue? Anybody have any ideas? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/790b41ef-f97f-4695-b7c8-1c68bdd1cd38%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

