Consider:

TessBase::GetConnectedComponents(pixa);

  // Gets the individual connected (text) components (created
  // after pages segmentation step, but before recognition)
  // as a leptonica-style Boxa, Pixa pair, in reading order.
  // Can be called before or after Recognize.
  // Note: the caller is responsible for calling boxaDestroy()
  // on the returned Boxa array and pixaDestroy() on cc array.
  Boxa* GetConnectedComponents(Pixa** cc);

Iterate through the pixa:
1. SetImage to the pix.
2. Set the whitelist.
3. GetUTF8Text().

For the first two and last two pix,
SetVariable("tessedit_char_whitelist", "ABCDEFGHIJKLMNOPQRSTUVWXYZ").
For the middle three you would use
SetVariable("tessedit_char_whitelist", "0123456789").

Completely untested, but I think its a good starting point.

Cheers,
dm

On Apr 2, 5:22 pm, Stevie Shannon <[email protected]> wrote:
> I'm also interested in an answer to this!
>
>
>
>
>
>
>
> On Wednesday, March 28, 2012 10:35:29 AM UTC+1, Scyllar wrote:
>
> > Same question with me, please help!
>
> > On Tuesday, March 27, 2012 10:23:33 AM UTC+8, Neo Song wrote:
>
> >> Dear All,
>
> >>     Currently I am using Tesseract 3.01, and I go through all the
> >> variables that can be set before recognize. And What I need to
> >> recognize is a fixed pattern of string like "OX345PT" with two letters
> >> at first, three digits behind, and two letters at last. Is there a
> >> mechanism of character template in Tesseract 3.01, which can be
> >> applied through "SetVariable" function, to improve the accuracy?

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to