Re: How to get OCR-ed character coordinates?

haoest Fri, 09 Sep 2011 18:37:48 -0700

I will investigate your suggestions as soon as I can. Thank you for
the pointer, kind sir.




On Sep 8, 8:28 pm, Dmitri Silaev <[email protected]> wrote:
> Using GetUTF8Text() and then GetWords() is an overkill. However you
> can examine GetWords() and then GetComponentImages() for typical use
> of the "PageIterator" class which is a main means to access
> Tesseract's result details, including bounding boxes, at the API
> level.
>
> Warm regards,
> Dmitri Silaevwww.CustomOCR.com
>
>
>
>
>
>
>
> On Thu, Sep 8, 2011 at 8:59 AM, haoest <[email protected]> wrote:
> > Hi Dmitri,
>
> > here's the snipet:
>
> >     //numberStrip is an opencv IplImage object
> >     tess->SetImage((uchar*) numberStrip->imageData, numberStrip-
> >>width, numberStrip->height,
> >                                   numberStrip->depth / 8,
> >                                   numberStrip->widthStep);
> >    text = tess->GetUTF8Text(); //text is fine, it contains digits
> > from the OpenCV image
> >    Boxa* bounds = tess->GetWords(NULL);
> >    l_int32 count = bounds->n; // count > 3 million :(
> >    for(int i=0; i<count; i++){
> >        Box* b = bounds->box[i];
> >        /// coords below are all 0's, and sometimes I have bad access
> >        int x = b->x;
> >        int y = b->y;
> >        int w = b->w;
> >        int h = b->h;
> >    }
>
> > On Sep 7, 4:42 pm, Dmitri Silaev <[email protected]> wrote:
> >> Well, it's hard to tell without having seen your own code. Send it if
> >> you can afford.
>
> >> Warm regards,
> >> Dmitri Silaevwww.CustomOCR.com
>
> >> On Wed, Sep 7, 2011 at 3:31 AM, haoest <[email protected]> wrote:
> >> > Hi Dmitri,
>
> >> > Thanks for the guidance.
>
> >> > I looked up GetHOCRText() and compared it with GetWords(Pixa pixa).
> >> > They do very similar things, as they both get the coordinates through
> >> > word->bounding_box(); however my test show that GetHOCRText() produces
> >> > an html file with correct coordinates for the words, but GetWords
> >> > still gives me a Boxa object with 3 million words (gibberish). I don't
> >> > quite know what I did wrong.
>
> >> > On Sep 6, 3:37 am, Dmitri Silaev <[email protected]> wrote:
> >> >> Examine control paths for 'tessedit_create_hocr' variable and see how
> >> >> rectangle coordinates are being obtained.
>
> >> >> Warm regards,
> >> >> Dmitri Silaevwww.CustomOCR.com
>
> >> >> On Tue, Sep 6, 2011 at 5:04 AM, haoest <[email protected]> wrote:
> >> >> > Hello,
>
> >> >> > I have a very simple OCR app based on Tesseract. After the recognition
> >> >> > step, I also provide a user verification step that allows correction
> >> >> > in case OCR is wrong. To improve the user interface, I plan to draw a
> >> >> > rectangle on top of the OCR-ed character on the original input image,
> >> >> > and put it side by side with the OCR output. To get to that, I need
> >> >> > the coordinate of the recognized characters.
>
> >> >> > I tried something like this but it seems to give me gibberish:
>
> >> >> >        ETEXT_DESC output;
> >> >> >        tess->Recognize(&output);
> >> >> >        text = tess->GetUTF8Text();
>
> >> >> > Now if I access output->count, it gives me some value above 10,000,
> >> >> > which is obviously wrong because the whole image only has 20 or so.
>
> >> >> > Am I on the right track? Can I have some direction please?
>
> >> >> > --
> >> >> > You received this message because you are subscribed to the Google
> >> >> > Groups "tesseract-ocr" group.
> >> >> > To post to this group, send email to [email protected]
> >> >> > To unsubscribe from this group, send email to
> >> >> > [email protected]
> >> >> > For more options, visit this group at
> >> >> >http://groups.google.com/group/tesseract-ocr?hl=en
>
> >> > --
> >> > You received this message because you are subscribed to the Google
> >> > Groups "tesseract-ocr" group.
> >> > To post to this group, send email to [email protected]
> >> > To unsubscribe from this group, send email to
> >> > [email protected]
> >> > For more options, visit this group at
> >> >http://groups.google.com/group/tesseract-ocr?hl=en
>
> > --
> > You received this message because you are subscribed to the Google
> > Groups "tesseract-ocr" group.
> > To post to this group, send email to [email protected]
> > To unsubscribe from this group, send email to
> > [email protected]
> > For more options, visit this group at
> >http://groups.google.com/group/tesseract-ocr?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: How to get OCR-ed character coordinates?

Reply via email to