Hi all, a while ago, I wrote myself a hack to tesseract to insert a blank line before every new paragraph. I did that by checking the x-position of the first word in every line with respect to the left side of the current block in baseapi.cpp:TessBaseAPI::GetUTF8Text(). This code worked well enough when I wrote it for svn release 319, but I thought I would update to a newer source release (525), and find that it no longer works. The code gets the left side of the current block via:
page_res_it.block()->block->bounding_box().left() That used to be set to the x-position of the current block of text, but now I find that the bounding_box just encompasses the entire image. So, my question is: is the bounding box of the current block no longer automatically updated? Do I have to enable something in the configs to get the bounding box computed properly again? I never tried my hack with any of the source releases between 319 and 525, so I don't know when the behaviour changed. Cheers, Rob Komar -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

