Hi Dmitri,

Thanks for the guidance.

I looked up GetHOCRText() and compared it with GetWords(Pixa pixa).
They do very similar things, as they both get the coordinates through
word->bounding_box(); however my test show that GetHOCRText() produces
an html file with correct coordinates for the words, but GetWords
still gives me a Boxa object with 3 million words (gibberish). I don't
quite know what I did wrong.


On Sep 6, 3:37 am, Dmitri Silaev <[email protected]> wrote:
> Examine control paths for 'tessedit_create_hocr' variable and see how
> rectangle coordinates are being obtained.
>
> Warm regards,
> Dmitri Silaevwww.CustomOCR.com
>
>
>
>
>
>
>
> On Tue, Sep 6, 2011 at 5:04 AM, haoest <[email protected]> wrote:
> > Hello,
>
> > I have a very simple OCR app based on Tesseract. After the recognition
> > step, I also provide a user verification step that allows correction
> > in case OCR is wrong. To improve the user interface, I plan to draw a
> > rectangle on top of the OCR-ed character on the original input image,
> > and put it side by side with the OCR output. To get to that, I need
> > the coordinate of the recognized characters.
>
> > I tried something like this but it seems to give me gibberish:
>
> >        ETEXT_DESC output;
> >        tess->Recognize(&output);
> >        text = tess->GetUTF8Text();
>
> > Now if I access output->count, it gives me some value above 10,000,
> > which is obviously wrong because the whole image only has 20 or so.
>
> > Am I on the right track? Can I have some direction please?
>
> > --
> > You received this message because you are subscribed to the Google
> > Groups "tesseract-ocr" group.
> > To post to this group, send email to [email protected]
> > To unsubscribe from this group, send email to
> > [email protected]
> > For more options, visit this group at
> >http://groups.google.com/group/tesseract-ocr?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to