Well, it's hard to tell without having seen your own code. Send it if
you can afford.

Warm regards,
Dmitri Silaev
www.CustomOCR.com





On Wed, Sep 7, 2011 at 3:31 AM, haoest <[email protected]> wrote:
> Hi Dmitri,
>
> Thanks for the guidance.
>
> I looked up GetHOCRText() and compared it with GetWords(Pixa pixa).
> They do very similar things, as they both get the coordinates through
> word->bounding_box(); however my test show that GetHOCRText() produces
> an html file with correct coordinates for the words, but GetWords
> still gives me a Boxa object with 3 million words (gibberish). I don't
> quite know what I did wrong.
>
>
> On Sep 6, 3:37 am, Dmitri Silaev <[email protected]> wrote:
>> Examine control paths for 'tessedit_create_hocr' variable and see how
>> rectangle coordinates are being obtained.
>>
>> Warm regards,
>> Dmitri Silaevwww.CustomOCR.com
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Sep 6, 2011 at 5:04 AM, haoest <[email protected]> wrote:
>> > Hello,
>>
>> > I have a very simple OCR app based on Tesseract. After the recognition
>> > step, I also provide a user verification step that allows correction
>> > in case OCR is wrong. To improve the user interface, I plan to draw a
>> > rectangle on top of the OCR-ed character on the original input image,
>> > and put it side by side with the OCR output. To get to that, I need
>> > the coordinate of the recognized characters.
>>
>> > I tried something like this but it seems to give me gibberish:
>>
>> >        ETEXT_DESC output;
>> >        tess->Recognize(&output);
>> >        text = tess->GetUTF8Text();
>>
>> > Now if I access output->count, it gives me some value above 10,000,
>> > which is obviously wrong because the whole image only has 20 or so.
>>
>> > Am I on the right track? Can I have some direction please?
>>
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups "tesseract-ocr" group.
>> > To post to this group, send email to [email protected]
>> > To unsubscribe from this group, send email to
>> > [email protected]
>> > For more options, visit this group at
>> >http://groups.google.com/group/tesseract-ocr?hl=en
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to