Hi Dmitri, Thanks for the guidance.
I looked up GetHOCRText() and compared it with GetWords(Pixa pixa). They do very similar things, as they both get the coordinates through word->bounding_box(); however my test show that GetHOCRText() produces an html file with correct coordinates for the words, but GetWords still gives me a Boxa object with 3 million words (gibberish). I don't quite know what I did wrong. On Sep 6, 3:37 am, Dmitri Silaev <[email protected]> wrote: > Examine control paths for 'tessedit_create_hocr' variable and see how > rectangle coordinates are being obtained. > > Warm regards, > Dmitri Silaevwww.CustomOCR.com > > > > > > > > On Tue, Sep 6, 2011 at 5:04 AM, haoest <[email protected]> wrote: > > Hello, > > > I have a very simple OCR app based on Tesseract. After the recognition > > step, I also provide a user verification step that allows correction > > in case OCR is wrong. To improve the user interface, I plan to draw a > > rectangle on top of the OCR-ed character on the original input image, > > and put it side by side with the OCR output. To get to that, I need > > the coordinate of the recognized characters. > > > I tried something like this but it seems to give me gibberish: > > > ETEXT_DESC output; > > tess->Recognize(&output); > > text = tess->GetUTF8Text(); > > > Now if I access output->count, it gives me some value above 10,000, > > which is obviously wrong because the whole image only has 20 or so. > > > Am I on the right track? Can I have some direction please? > > > -- > > You received this message because you are subscribed to the Google > > Groups "tesseract-ocr" group. > > To post to this group, send email to [email protected] > > To unsubscribe from this group, send email to > > [email protected] > > For more options, visit this group at > >http://groups.google.com/group/tesseract-ocr?hl=en -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

