Hi folks,
I have a similar problem.
I need to extract character confidence, but I'm not finding a way.
I read all posts (or at least the ones I found) about the subject here
in this forum. I tried everything without success.
I'm using tesseract-3.0.0

1 - My first approach was based on the post re-posted by Sven,
http://www.google.com/url?sa=D&q=http://groups.google.com/group/tesseract-ocr/search%3Fgroup%3Dtesseract-ocr%26q%3Dcharacter-level%2Bconfidence%26qt_g%3DSearch%2Bthis%2Bgroup
I did not found anything as a ResultIterator.
And even setting the variable, I still get the confidence of the word
and not of each character.

2 - After that, I tried to use a code based on the dotnet wrapper:
http://tesseractdotnet.googlecode.com/svn/trunk/dotnetwrapper/TesseractEngineWrapper/tesseractenginewrapper.cpp
Although the Apply function works fine, when I tried to work with the
RetrieveResultDetails() to get the character confidences from the
Character class, I never got a head->count more than 0.
The strategy here was to use the Monitor to get those values. Use
SetMonitor and the Recognize function using the monitor, but it did
not work.

3 - My last shot was to separate the characters, first separating the
characteres in idividual files and processing the OCR. But even with
PSM_CHAR I just got empty_page results. The second  option was to just
put them far apart from each other in the same image, but the same
thing happened, empty_page (using PSM_AUTO or PSM_CHAR). It just
worked if some of them are together.

The pattern of my character is as follow:

B 3823482 41

I can have only 3 distinct word confidences.

What else could I try? Anybody ever was able to get the character
confidence successfully ?
Please, I really need it.

Thanks in advance.

BR

Beto

On Aug 23, 8:56 pm, _Filipe <[email protected]> wrote:
> Hi Sven,
> I took a look at the link, but doing a deeper search in the forums I
> found a discussion about the function extract_result in api/
> tessbaseapi.cpp.
> I'm trying to use it.
> My problem now is to access the data inside the object TESS_CHAR_IT,
> passed as an argument to this function.
> I searched the code and I was not able to find any reference to
> extract the results from it.
> Where is it declared??
> How to get the TESS_CHAR objects inside it?
> Thanks again in advance.
>
> Best Regards
>
> On Aug 18, 2:05 am, Sven Pedersen <[email protected]> wrote:
>
>
>
>
>
>
>
> > Hi Filipe,
> > Please search the archives -- micke and Dmitri Silaev had a
> > conversation about this in 
> > April.http://groups.google.com/group/tesseract-ocr/search?group=tesseract-o...
>
> > --Sven
>
> > On Wed, Aug 17, 2011 at 2:17 PM, _Filipe <[email protected]> wrote:
> > > Hello guys!
>
> > > I got a really tough work in text recognition area and I'm using
> > > Tesserct-ocr as my OCR tool.
> > > The problem consist in recognize IDs printed on steel slabs and
> > > identify them.
> > > Using only Tesseract I recognize no text. So a detection and
> > > segmentation phase is necessary.
> > > With that segmentation and after training the tesseract with our
> > > dictionary, the recognition rate is about 60%.
>
> > > We find a way to increase it in a probabilistic way, that would change
> > > characters, fixing common errors.
> > > To accomplish that we will need the confidence returned by the ocr
> > > tool in a character base way.
> > > I saw in code the function which returns it from a word.
> > > How could I get it for a character, is it possible with the current
> > > API?
> > > if not, is there a way to change the tesseract code to get that? Where
> > > should I start from?
>
> > > Thanks in advance.
>
> > > Best Regards
>
> > > Filipe
>
> > > --
> > > You received this message because you are subscribed to the Google
> > > Groups "tesseract-ocr" group.
> > > To post to this group, send email to [email protected]
> > > To unsubscribe from this group, send email to
> > > [email protected]
> > > For more options, visit this group at
> > >http://groups.google.com/group/tesseract-ocr?hl=en
>
> > --
> > ``All that is gold does not glitter,
> >   not all those who wander are lost;
> > the old that is strong does not wither,
> >   deep roots are not reached by the frost.
> > From the ashes a fire shall be woken,
> >   a light from the shadows shall spring;
> > renewed shall be blade that was broken,
> >   the crownless again shall be king.”

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to