Hi, In version 3 of tesseract-ocr there's a new page layout analysis module. I'm interested to learn in what way it is used and how it can be used.
Does it provide additional user functionality or is it only used internally? I.e. can I query it somehow to output all recognized text areas (position and dimensions) without its actual text content? Does it have any influence on the mark-up of the text output? I.e. e.g. additional line breaks between text in case of a new paragraph. I've played with the different pagesegmode values (0-3) but it gives me the exact same output for each of them. Do these settings have anything to do with the layout analysis? If recognizing text areas is what it does but you can't output just the position and dimensions of them, it would be great to see this as a new feature. In a program like gImageReader you have to do this manually, OCRFeeder tries to do it automatically. If tesseract-ocr's analysis is more accurate, one could use that as an input for OCRFeeder again. Yours, Age Bosma -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

