You need to get box coordinates (BoundingBox) for each symbol[1]. Try to
follow hocr algorithm within tesseract[2]. hocr is focusing on word/line
but the logic would be the same for symbols (and it could be simplified).

Or maybe search for  "character confidence" in issues and forum. There
should be example how to split input text to character(symbols) but instead
confidence call BoundingBox function.

With box coordinates than you can copy relevant part of image and
create necessary array.

[1]
http://code.google.com/p/tesseract-ocr/source/browse/trunk/api/baseapi.cpp?r=777#1035
[2]
http://code.google.com/p/tesseract-ocr/source/browse/trunk/api/tesseractmain.cpp?r=780

-- 
Zdenko

On Tue, Nov 13, 2012 at 4:36 PM, Walid Khedr <[email protected]> wrote:

> Hi,
> I am developing an OCR algorithm for Arabic language. I tried the
> following code which performs the OCR operation. What i need is
> the intermediate stage which is the segmentation process (the characters as
> separated images) to test my algorithm. Thank you for your reply.
>
> #include <baseapi.h>
> #include <allheaders.h>
> #include <sys/time.h>
> int main() {
>         // [1]
>         tesseract::TessBaseAPI *myOCR =
>                 new tesseract::TessBaseAPI();
>
>
>
>         // [2]
>         printf(“Tesseract-ocr version: %s\n”,
>                myOCR->Version());
>         printf(“Leptonica version: %s\n”,
>                getLeptonicaVersion());
>
>         // [3]
>         if (myOCR->Init(NULL, “eng”)) {
>           fprintf(stderr, “Could not initialize tesseract.\n”);
>           exit(1);
>         }
>
>         // [4]
>         Pix *pix = pixRead(“phototest.tif”);
>         myOCR->SetImage(pix);
>
>         // [5]
>         char* outText = myOCR->GetUTF8Text();
>         printf(“OCR output:\n\n”);
>         printf(outText);
>
>         // [6]
>         myOCR->Clear();
>         myOCR->End();
>         delete [] outText;
>         pixDestroy(&pix);
>         return 0;
> }
>
> On Tuesday, November 13, 2012 4:58:16 PM UTC+2, sventech wrote:
>
>> You should follow the instructions in the FAQ, etc. and post what you've
>> tried with example images. We are very happy to help, but we are not
>> programming teachers. Ask technical questions and we'll probably be able to
>> give you answers.
>> --Sven
>>
>>
>> On Mon, Nov 12, 2012 at 11:17 PM, Walid Khedr <[email protected]> wrote:
>>
>>> Hi,
>>> I'm new in tesseract. I just want to use it for Character Segmentation.
>>> The input is an image of a text string and the output will be an array of
>>> *images *for each character. Could someone post step-by-step for this
>>> segmentation.
>>>
>>> Thank You
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to [email protected]
>>>
>>> To unsubscribe from this group, send email to
>>> tesseract-oc...@**googlegroups.com
>>>
>>> For more options, visit this group at
>>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>
>>
>>
>>
>> --
>> ``All that is gold does not glitter,
>>   not all those who wander are lost;
>> the old that is strong does not wither,
>>   deep roots are not reached by the frost.
>> From the ashes a fire shall be woken,
>>   a light from the shadows shall spring;
>> renewed shall be blade that was broken,
>>   the crownless again shall be king.”
>>
>  --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to