Note that per my other post, this will only work for the Tesseract
OcrEngineMode. If you're using Cube or Combined, I'm not sure what the
solution would be.

On Apr 5, 1:40 pm, dm <[email protected]> wrote:
> Consider:
>
> TessBase::GetConnectedComponents(pixa);
>
>   // Gets the individual connected (text) components (created
>   // after pages segmentation step, but before recognition)
>   // as a leptonica-style Boxa, Pixa pair, in reading order.
>   // Can be called before or after Recognize.
>   // Note: the caller is responsible for calling boxaDestroy()
>   // on the returned Boxa array and pixaDestroy() on cc array.
>   Boxa* GetConnectedComponents(Pixa** cc);
>
> Iterate through the pixa:
> 1. SetImage to the pix.
> 2. Set the whitelist.
> 3. GetUTF8Text().
>
> For the first two and last two pix,
> SetVariable("tessedit_char_whitelist", "ABCDEFGHIJKLMNOPQRSTUVWXYZ").
> For the middle three you would use
> SetVariable("tessedit_char_whitelist", "0123456789").
>
> Completely untested, but I think its a good starting point.
>
> Cheers,
> dm
>
> On Apr 2, 5:22 pm, Stevie Shannon <[email protected]> wrote:
>
>
>
>
>
>
>
> > I'm also interested in an answer to this!
>
> > On Wednesday, March 28, 2012 10:35:29 AM UTC+1, Scyllar wrote:
>
> > > Same question with me, please help!
>
> > > On Tuesday, March 27, 2012 10:23:33 AM UTC+8, Neo Song wrote:
>
> > >> Dear All,
>
> > >>     Currently I am using Tesseract 3.01, and I go through all the
> > >> variables that can be set before recognize. And What I need to
> > >> recognize is a fixed pattern of string like "OX345PT" with two letters
> > >> at first, three digits behind, and two letters at last. Is there a
> > >> mechanism of character template in Tesseract 3.01, which can be
> > >> applied through "SetVariable" function, to improve the accuracy?

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to