On Tue, Mar 27, 2012 at 9:19 PM, Falke <[email protected]> wrote:
> Anyone?
>
> In case the length of my posts scared off a few readers, here's a more
> condensed version:
>
> having box coordinates for every recognized character in the final
> result, would allow one to extend the recognition process by either re-
> doing the recognition (with alternative image pre-processing), or
> doing a number of creative things such as automated image extraction,
> or maybe even some other clever layout recognition. You could use box
> info to determine line spacing (and/or font size), indentations, and
> many other things. (Though, i realize, the latest version has
> indentation recognition)
>
> Where do I even begin to look in the code, to see if I can get a
> printout of the coordinates?
BOXA *boxesP = TessBaseAPI::GetConnectedComponents(NULL);
then perhaps:
PIX *thresholdedPixP = apiP->GetThresholdedImage();
PIX *markedP = pixConvertTo32(thresholdedPixP);
pixDestroy(&thresholdedPixP);
pixRenderBoxaBlend (markedP, boxesP, 1 /* line width */,
255, 0, 0, 0.80 /* alpha */, 0);
//do something here with markedP like display or write it
pixDestroy(&markedP);
boxaDestroy(&boxesP);
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en