Hello, i'm working on a project using tesseract in ios and i want to use the BLOCK_LIST data. Unfortunately i dont know how to access the objects within the block_list. I can't even find the file where this class is defined. Perhaps you can give me a hint how to iterate over the blocks and get the boundingbox coordinates of the rows in the block.
thanks and kind regards, max Am Montag, 20. Juni 2011 11:56:33 UTC+2 schrieb Patrick Questembert: > > You can definitely get just layout analysis before text recognition - > look at the FindLinesCreateBlockList() API and the BLOCK_LIST data > structure. You can then iterate through that structure to look at > blocks and rows within these blocks. Keep in mind that a sentence in > the image could be broken out into separate boxes altogether if you > have anything more complex than a simple page, so you'll have to do > the stiching yourself of rows in entirely different boxes, based on > their coordinates. There are even cases where you might get > "Patrick"returned as one row containing "Ptrik" and one row containing > "ic" - rare but happens too, especially when the text line has a slope > (even if very moderate). > > Patrick > > On Jun 19, 4:07 pm, Prodoc <[email protected]> wrote: > > Hi, > > > > In version 3 of tesseract-ocr there's a new page layout analysis > > module. I'm interested to learn in what way it is used and how it can > > be used. > > > > Does it provide additional user functionality or is it only used > > internally? I.e. can I query it somehow to output all recognized text > > areas (position and dimensions) without its actual text content? > > Does it have any influence on the mark-up of the text output? I.e. > > e.g. additional line breaks between text in case of a new paragraph. > > I've played with the different pagesegmode values (0-3) but it gives > > me the exact same output for each of them. Do these settings have > > anything to do with the layout analysis? > > > > If recognizing text areas is what it does but you can't output just > > the position and dimensions of them, it would be great to see this as > > a new feature. In a program like gImageReader you have to do this > > manually, OCRFeeder tries to do it automatically. If tesseract-ocr's > > analysis is more accurate, one could use that as an input for > > OCRFeeder again. > > > > Yours, > > > > Age Bosma -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

