Re: How to get OCR-ed character coordinates?

haoest Wed, 07 Sep 2011 22:27:25 -0700

Hi Dmitri,

here's the snipet:


     //numberStrip is an opencv IplImage object
     tess->SetImage((uchar*) numberStrip->imageData, numberStrip-
>width, numberStrip->height,
                                   numberStrip->depth / 8,
                                   numberStrip->widthStep);
    text = tess->GetUTF8Text(); //text is fine, it contains digits
from the OpenCV image
    Boxa* bounds = tess->GetWords(NULL);
    l_int32 count = bounds->n; // count > 3 million :(
    for(int i=0; i<count; i++){
        Box* b = bounds->box[i];
        /// coords below are all 0's, and sometimes I have bad access
        int x = b->x;
        int y = b->y;
        int w = b->w;
        int h = b->h;
    }



On Sep 7, 4:42 pm, Dmitri Silaev <[email protected]> wrote:
> Well, it's hard to tell without having seen your own code. Send it if
> you can afford.
>
> Warm regards,
> Dmitri Silaevwww.CustomOCR.com
>
>
>
>
>
>
>
> On Wed, Sep 7, 2011 at 3:31 AM, haoest <[email protected]> wrote:
> > Hi Dmitri,
>
> > Thanks for the guidance.
>
> > I looked up GetHOCRText() and compared it with GetWords(Pixa pixa).
> > They do very similar things, as they both get the coordinates through
> > word->bounding_box(); however my test show that GetHOCRText() produces
> > an html file with correct coordinates for the words, but GetWords
> > still gives me a Boxa object with 3 million words (gibberish). I don't
> > quite know what I did wrong.
>
> > On Sep 6, 3:37 am, Dmitri Silaev <[email protected]> wrote:
> >> Examine control paths for 'tessedit_create_hocr' variable and see how
> >> rectangle coordinates are being obtained.
>
> >> Warm regards,
> >> Dmitri Silaevwww.CustomOCR.com
>
> >> On Tue, Sep 6, 2011 at 5:04 AM, haoest <[email protected]> wrote:
> >> > Hello,
>
> >> > I have a very simple OCR app based on Tesseract. After the recognition
> >> > step, I also provide a user verification step that allows correction
> >> > in case OCR is wrong. To improve the user interface, I plan to draw a
> >> > rectangle on top of the OCR-ed character on the original input image,
> >> > and put it side by side with the OCR output. To get to that, I need
> >> > the coordinate of the recognized characters.
>
> >> > I tried something like this but it seems to give me gibberish:
>
> >> >        ETEXT_DESC output;
> >> >        tess->Recognize(&output);
> >> >        text = tess->GetUTF8Text();
>
> >> > Now if I access output->count, it gives me some value above 10,000,
> >> > which is obviously wrong because the whole image only has 20 or so.
>
> >> > Am I on the right track? Can I have some direction please?
>
> >> > --
> >> > You received this message because you are subscribed to the Google
> >> > Groups "tesseract-ocr" group.
> >> > To post to this group, send email to [email protected]
> >> > To unsubscribe from this group, send email to
> >> > [email protected]
> >> > For more options, visit this group at
> >> >http://groups.google.com/group/tesseract-ocr?hl=en
>
> > --
> > You received this message because you are subscribed to the Google
> > Groups "tesseract-ocr" group.
> > To post to this group, send email to [email protected]
> > To unsubscribe from this group, send email to
> > [email protected]
> > For more options, visit this group at
> >http://groups.google.com/group/tesseract-ocr?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: How to get OCR-ed character coordinates?

Reply via email to