On Thu, Mar 13, 2014 at 7:16 PM, Steve Aston <[email protected]> wrote:
> I'd like to begin implementing it in a library but I need to know if
> there's a method in tesseract to go through and slice out all the
> characters without recognizing them. I basically want to go through,
> identify a character and have the character cut out and saved as a seperate
> image file that I can manipulate. I'm a *terrible* programmer so I
> haven't been able to crack this yet, can anyone help me out?
>
I am not sure what do you mean by "terrible programmer" and you did not
mentioned programming language ;-). So I will assume you can use C++.
If I got it right you need character (symbol) coordinates. You can get it
with GetComponentImages. Have a look at example[1] but use RIL_SYMBOL
instead of RIL_TEXTLINE. Than you can iterate over them and instead of OCR
(api->GetUTF8Text()), you can save it with something like this:
api->SetRectangle(box->x, box->y, box->w, box->h);
Pix *symbol = api->GetThresholdedImage();
snprintf(filename, 20, "export_%05d.png", i);
pixWrite(filename, symbol, IFF_PNG);
pixDestroy(&symbol);
I hope this helps you.
[1] https://code.google.com/p/tesseract-ocr/wiki/APIExample#example
Zdenko
--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
---
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.