I will investigate your suggestions as soon as I can. Thank you for the pointer, kind sir.
On Sep 8, 8:28 pm, Dmitri Silaev <[email protected]> wrote: > Using GetUTF8Text() and then GetWords() is an overkill. However you > can examine GetWords() and then GetComponentImages() for typical use > of the "PageIterator" class which is a main means to access > Tesseract's result details, including bounding boxes, at the API > level. > > Warm regards, > Dmitri Silaevwww.CustomOCR.com > > > > > > > > On Thu, Sep 8, 2011 at 8:59 AM, haoest <[email protected]> wrote: > > Hi Dmitri, > > > here's the snipet: > > > //numberStrip is an opencv IplImage object > > tess->SetImage((uchar*) numberStrip->imageData, numberStrip- > >>width, numberStrip->height, > > numberStrip->depth / 8, > > numberStrip->widthStep); > > text = tess->GetUTF8Text(); //text is fine, it contains digits > > from the OpenCV image > > Boxa* bounds = tess->GetWords(NULL); > > l_int32 count = bounds->n; // count > 3 million :( > > for(int i=0; i<count; i++){ > > Box* b = bounds->box[i]; > > /// coords below are all 0's, and sometimes I have bad access > > int x = b->x; > > int y = b->y; > > int w = b->w; > > int h = b->h; > > } > > > On Sep 7, 4:42 pm, Dmitri Silaev <[email protected]> wrote: > >> Well, it's hard to tell without having seen your own code. Send it if > >> you can afford. > > >> Warm regards, > >> Dmitri Silaevwww.CustomOCR.com > > >> On Wed, Sep 7, 2011 at 3:31 AM, haoest <[email protected]> wrote: > >> > Hi Dmitri, > > >> > Thanks for the guidance. > > >> > I looked up GetHOCRText() and compared it with GetWords(Pixa pixa). > >> > They do very similar things, as they both get the coordinates through > >> > word->bounding_box(); however my test show that GetHOCRText() produces > >> > an html file with correct coordinates for the words, but GetWords > >> > still gives me a Boxa object with 3 million words (gibberish). I don't > >> > quite know what I did wrong. > > >> > On Sep 6, 3:37 am, Dmitri Silaev <[email protected]> wrote: > >> >> Examine control paths for 'tessedit_create_hocr' variable and see how > >> >> rectangle coordinates are being obtained. > > >> >> Warm regards, > >> >> Dmitri Silaevwww.CustomOCR.com > > >> >> On Tue, Sep 6, 2011 at 5:04 AM, haoest <[email protected]> wrote: > >> >> > Hello, > > >> >> > I have a very simple OCR app based on Tesseract. After the recognition > >> >> > step, I also provide a user verification step that allows correction > >> >> > in case OCR is wrong. To improve the user interface, I plan to draw a > >> >> > rectangle on top of the OCR-ed character on the original input image, > >> >> > and put it side by side with the OCR output. To get to that, I need > >> >> > the coordinate of the recognized characters. > > >> >> > I tried something like this but it seems to give me gibberish: > > >> >> > ETEXT_DESC output; > >> >> > tess->Recognize(&output); > >> >> > text = tess->GetUTF8Text(); > > >> >> > Now if I access output->count, it gives me some value above 10,000, > >> >> > which is obviously wrong because the whole image only has 20 or so. > > >> >> > Am I on the right track? Can I have some direction please? > > >> >> > -- > >> >> > You received this message because you are subscribed to the Google > >> >> > Groups "tesseract-ocr" group. > >> >> > To post to this group, send email to [email protected] > >> >> > To unsubscribe from this group, send email to > >> >> > [email protected] > >> >> > For more options, visit this group at > >> >> >http://groups.google.com/group/tesseract-ocr?hl=en > > >> > -- > >> > You received this message because you are subscribed to the Google > >> > Groups "tesseract-ocr" group. > >> > To post to this group, send email to [email protected] > >> > To unsubscribe from this group, send email to > >> > [email protected] > >> > For more options, visit this group at > >> >http://groups.google.com/group/tesseract-ocr?hl=en > > > -- > > You received this message because you are subscribed to the Google > > Groups "tesseract-ocr" group. > > To post to this group, send email to [email protected] > > To unsubscribe from this group, send email to > > [email protected] > > For more options, visit this group at > >http://groups.google.com/group/tesseract-ocr?hl=en -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

