Hello, I'm using Tesseract 3 as a simple command-line tool to generate OCR. It's doing a fairly good job, but I have one unmet need -- I need to be able to separate paragraphs with blank lines. It would be great if Tesseract could do this for me, but I'd even be happy if it could include indentation whitespace in the text so I could perform the splitting using my own software.
Is there any way to achieve this effect? On a somewhat related note, is there any way to control Tesseract's command line behavior at all? I see that it accepts a config file as a command-line option, but I'm having no luck finding documentation on what options are available or what they mean -- the provided examples don't actually seem to work, and even searching the code hasn't given me anything resembling a list of valid options. Any help or pointers in the right direction would be greatly appreciated! thanks, Demian -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

