On Thu, Mar 13, 2014 at 7:16 PM, Steve Aston <[email protected]> wrote:

> I'd like to begin implementing it in a library but I need to know if
> there's a method in tesseract to go through and slice out all the
> characters without recognizing them. I basically want to go through,
> identify a character and have the character cut out and saved as a seperate
> image file that I can manipulate. I'm a *terrible* programmer so I
> haven't been able to crack this yet, can anyone help me out?
>

I am not sure what do you mean by "terrible programmer" and you did not
mentioned programming language ;-). So I will assume you can use C++.

If I got it right you need character (symbol) coordinates. You can get it
with GetComponentImages. Have a look at example[1] but use RIL_SYMBOL
instead of RIL_TEXTLINE. Than you can iterate over them and instead of OCR
(api->GetUTF8Text()), you can save it with something like this:

    api->SetRectangle(box->x, box->y, box->w, box->h);

     Pix *symbol = api->GetThresholdedImage();

  snprintf(filename, 20, "export_%05d.png", i);

  pixWrite(filename, symbol, IFF_PNG);

  pixDestroy(&symbol);


I hope this helps you.

[1] https://code.google.com/p/tesseract-ocr/wiki/APIExample#example

Zdenko

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to