On Tue, Mar 27, 2012 at 9:19 PM, Falke <[email protected]> wrote:
> Anyone?
>
> In case the length of my posts scared off a few readers, here's a more
> condensed version:
>
> having box coordinates for every recognized character in the final
> result, would allow one to extend the recognition process by either re-
> doing the recognition (with alternative image pre-processing), or
> doing a number of creative things such as automated image extraction,
> or maybe even some other clever layout recognition.  You could use box
> info to determine line spacing (and/or font size), indentations, and
> many other things.  (Though, i realize, the latest version has
> indentation recognition)
>
> Where do I even begin to look in the code, to see if I  can get a
> printout of the coordinates?

   BOXA *boxesP = TessBaseAPI::GetConnectedComponents(NULL);

then perhaps:

  PIX *thresholdedPixP = apiP->GetThresholdedImage();
  PIX *markedP = pixConvertTo32(thresholdedPixP);
  pixDestroy(&thresholdedPixP);

  pixRenderBoxaBlend (markedP, boxesP, 1 /* line width */,
                      255, 0, 0, 0.80 /* alpha */, 0);

  //do something here with markedP like display or write it

  pixDestroy(&markedP);
  boxaDestroy(&boxesP);

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to