On 12 August 2010 10:24, Eugene Reimer <[email protected]> wrote:
> You could probably improve its ability to recognize "00" as two 0's by
> training it on such paired symbols.
>
> Mind you, I have also been surprised by cases where a perfectly clear and
> flawless symbol gets subdivided, like a N becoming |\| or an H becoming I-I,
> which indicates that tesseract has code to subdivide blobs other than based
> on there being "space" between them.  However that code seems to behave in
> erratic ways.

Actually, on this image, I get:
Mobile (65) 81(1) 6(l)2

which is more or less the behaviour you're talking about; however, you
should bear in mind that what looks like a solid shape to you does not
necessarily look like a solid shape to the recogniser.

Some (possibly) related variables:

INT_VAR (repair_unchopped_blobs, 1, "Fix blobs that aren't chopped");
double_VAR(tessedit_certainty_threshold, -2.25, "Good blob limit");
BOOL_VAR(fragments_guide_chopper, FALSE,
         "Use information from fragments to guide chopping process");

INT_VAR(segment_adjust_debug, 0,
        "Segmentation adjustment debug");
BOOL_VAR(assume_fixed_pitch_char_segment, 0,
         "include fixed-pitch heuristics in char segmentation");
BOOL_VAR(use_new_state_cost, 0,
         "use new state cost heuristics for segmentation state evaluation");
double_VAR(heuristic_segcost_rating_base, 1.25,
           "base factor for adding segmentation cost into word rating."
           "It's a multiplying factor, the larger the value above 1, "
           "the bigger the effect of segmentation cost.");
double_VAR(heuristic_weight_rating, 1,
           "weight associated with char rating in combined cost of state");
double_VAR(heuristic_weight_width, 0,
           "weight associated with width evidence in combined cost of state");
double_VAR(heuristic_weight_seamcut, 0,
           "weight associated with seam cut in combined cost of state");
double_VAR(heuristic_max_char_wh_ratio, MAX_SQUAT,
           "max char width-to-height ratio allowed in segmentation");



-- 
<Leftmost> jimregan, that's because deep inside you, you are evil.
<Leftmost> Also not-so-deep inside you.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to