Unfortunately not only text output order can suffer from Tess's segmentation, but also extents of some text fragments can be identified incorrectly (say one "segmented" row can span over two "real" rows, probably in partial way), and that in turn can lead to *completely* irrelevant recognition results.
However you can run as many as possible tests on your images and "prove" that this probably is not the case, and hope that segmentation errors are won't be "destructive" and only will introduce this kind of "disorder". Then certainly you can use your (x,y)-sort method and be happy )) Warm regards, Dmitry Silaev On Thu, Feb 24, 2011 at 1:50 PM, Jose <[email protected]> wrote: > Dmitry the recognition works the only thing is the way it is parsing it... > :S I think segmentation of the images would be too much painful! I only > won't to change the other that is display or the bounding boxes so I could > now the x and y of the word recognized and thereby can organise the results > better myself! don't you think it's a good aproach? > thank you very much for you help -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

