when processing PDF files to obtain text content (convert to TIF with ImageMagick + run Tesseract 4.1.0 on output), I observe that in many cases, the input is read "vertically", such that words/numbers being close to each other (e.g. same line) in the input are torn apart in the txt output.
Is there any way to prevent this? And are there any recommendations for configuration of DPI etc. when processing PDF to text? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/606cf4d5-bea5-46f2-b8ba-8bb61a962be6%40googlegroups.com.

