*Customise the tesseract engine to recognize only the characters from
**A-Z,0-9,.(dot),
(space) by setting the character white-list   *  Kindly furnish the name of
the folder in which whitelist as well as blacklist are existed. I want to
utilise the same for Kannada scripts.
-sriranga(78yrs)

On Fri, Feb 18, 2011 at 11:57 AM, Ray Smith <[email protected]> wrote:

> From all this, I have identified the following ways of improving the
> results:
>
>    1. Customise the tesseract engine to recognize only the characters from
>    A-Z,0-9,.(dot), (space) by setting the character white-list. My
>    understanding is that the white-list is the list of characters that are
>    going to be sensed. I was inquisitive to know what the blacklist is meant 
> to
>    do?
>    Just the opposite of whitelist. You can disable specific characters
>    from the usual set.
>    2. A lot of times I have seen fairly good number plate images being
>    OCRed inaccurately. This could possibly be due to the word recognition
>    stage. Has anyone found a way to disable the dictionary / word recognition.
>    Play with segment_penalty_dict_*
>    3. Then there are some page segmentation modes
>    (PSM_AUTO,PSM_SINGLE_BLOCK, PSM_CHAR etc). Does PSM_CHAR imply that it will
>    consider the input image as a single character and run the algorithm
>    accordingly without attempting word recognition?
>    Yes.
>    4. Another important configuration macro that I have seen within the
>    code was AVS_FASTEST = 0,  AVS_MOST_ACCURATE = 100. However, I could not
>    find the same being used anywhere in the code. Does this have any impact on
>    the *character recognition*accuracy?
>    This control is dead in 3.01. Replaced by ocr_engine_mode. It just
>    controls the combination of tesseract vs cube. Cube increases the accuracy
>    slightly, but adds a lot of compute time.
>    5. Finally, I also plan to use the confidence level data. Are there any
>    indicators of confidence for characters as well. There is word confidence
>    data which can be found in TessBaseAPI::AllWordConfidences().
>    Yes, and they are exposed in the new ResultIterator in 3.01, otherwise
>    you have to go down into the guts of the data structures.
>
>  --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to