Hi folks, I have a similar problem. I need to extract character confidence, but I'm not finding a way. I read all posts (or at least the ones I found) about the subject here in this forum. I tried everything without success. I'm using tesseract-3.0.0
1 - My first approach was based on the post re-posted by Sven, http://www.google.com/url?sa=D&q=http://groups.google.com/group/tesseract-ocr/search%3Fgroup%3Dtesseract-ocr%26q%3Dcharacter-level%2Bconfidence%26qt_g%3DSearch%2Bthis%2Bgroup I did not found anything as a ResultIterator. And even setting the variable, I still get the confidence of the word and not of each character. 2 - After that, I tried to use a code based on the dotnet wrapper: http://tesseractdotnet.googlecode.com/svn/trunk/dotnetwrapper/TesseractEngineWrapper/tesseractenginewrapper.cpp Although the Apply function works fine, when I tried to work with the RetrieveResultDetails() to get the character confidences from the Character class, I never got a head->count more than 0. The strategy here was to use the Monitor to get those values. Use SetMonitor and the Recognize function using the monitor, but it did not work. 3 - My last shot was to separate the characters, first separating the characteres in idividual files and processing the OCR. But even with PSM_CHAR I just got empty_page results. The second option was to just put them far apart from each other in the same image, but the same thing happened, empty_page (using PSM_AUTO or PSM_CHAR). It just worked if some of them are together. The pattern of my character is as follow: B 3823482 41 I can have only 3 distinct word confidences. What else could I try? Anybody ever was able to get the character confidence successfully ? Please, I really need it. Thanks in advance. BR Beto On Aug 23, 8:56 pm, _Filipe <[email protected]> wrote: > Hi Sven, > I took a look at the link, but doing a deeper search in the forums I > found a discussion about the function extract_result in api/ > tessbaseapi.cpp. > I'm trying to use it. > My problem now is to access the data inside the object TESS_CHAR_IT, > passed as an argument to this function. > I searched the code and I was not able to find any reference to > extract the results from it. > Where is it declared?? > How to get the TESS_CHAR objects inside it? > Thanks again in advance. > > Best Regards > > On Aug 18, 2:05 am, Sven Pedersen <[email protected]> wrote: > > > > > > > > > Hi Filipe, > > Please search the archives -- micke and Dmitri Silaev had a > > conversation about this in > > April.http://groups.google.com/group/tesseract-ocr/search?group=tesseract-o... > > > --Sven > > > On Wed, Aug 17, 2011 at 2:17 PM, _Filipe <[email protected]> wrote: > > > Hello guys! > > > > I got a really tough work in text recognition area and I'm using > > > Tesserct-ocr as my OCR tool. > > > The problem consist in recognize IDs printed on steel slabs and > > > identify them. > > > Using only Tesseract I recognize no text. So a detection and > > > segmentation phase is necessary. > > > With that segmentation and after training the tesseract with our > > > dictionary, the recognition rate is about 60%. > > > > We find a way to increase it in a probabilistic way, that would change > > > characters, fixing common errors. > > > To accomplish that we will need the confidence returned by the ocr > > > tool in a character base way. > > > I saw in code the function which returns it from a word. > > > How could I get it for a character, is it possible with the current > > > API? > > > if not, is there a way to change the tesseract code to get that? Where > > > should I start from? > > > > Thanks in advance. > > > > Best Regards > > > > Filipe > > > > -- > > > You received this message because you are subscribed to the Google > > > Groups "tesseract-ocr" group. > > > To post to this group, send email to [email protected] > > > To unsubscribe from this group, send email to > > > [email protected] > > > For more options, visit this group at > > >http://groups.google.com/group/tesseract-ocr?hl=en > > > -- > > ``All that is gold does not glitter, > > not all those who wander are lost; > > the old that is strong does not wither, > > deep roots are not reached by the frost. > > From the ashes a fire shall be woken, > > a light from the shadows shall spring; > > renewed shall be blade that was broken, > > the crownless again shall be king.” -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

