Well, it's hard to tell without having seen your own code. Send it if you can afford.
Warm regards, Dmitri Silaev www.CustomOCR.com On Wed, Sep 7, 2011 at 3:31 AM, haoest <[email protected]> wrote: > Hi Dmitri, > > Thanks for the guidance. > > I looked up GetHOCRText() and compared it with GetWords(Pixa pixa). > They do very similar things, as they both get the coordinates through > word->bounding_box(); however my test show that GetHOCRText() produces > an html file with correct coordinates for the words, but GetWords > still gives me a Boxa object with 3 million words (gibberish). I don't > quite know what I did wrong. > > > On Sep 6, 3:37 am, Dmitri Silaev <[email protected]> wrote: >> Examine control paths for 'tessedit_create_hocr' variable and see how >> rectangle coordinates are being obtained. >> >> Warm regards, >> Dmitri Silaevwww.CustomOCR.com >> >> >> >> >> >> >> >> On Tue, Sep 6, 2011 at 5:04 AM, haoest <[email protected]> wrote: >> > Hello, >> >> > I have a very simple OCR app based on Tesseract. After the recognition >> > step, I also provide a user verification step that allows correction >> > in case OCR is wrong. To improve the user interface, I plan to draw a >> > rectangle on top of the OCR-ed character on the original input image, >> > and put it side by side with the OCR output. To get to that, I need >> > the coordinate of the recognized characters. >> >> > I tried something like this but it seems to give me gibberish: >> >> > ETEXT_DESC output; >> > tess->Recognize(&output); >> > text = tess->GetUTF8Text(); >> >> > Now if I access output->count, it gives me some value above 10,000, >> > which is obviously wrong because the whole image only has 20 or so. >> >> > Am I on the right track? Can I have some direction please? >> >> > -- >> > You received this message because you are subscribed to the Google >> > Groups "tesseract-ocr" group. >> > To post to this group, send email to [email protected] >> > To unsubscribe from this group, send email to >> > [email protected] >> > For more options, visit this group at >> >http://groups.google.com/group/tesseract-ocr?hl=en > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

