On 12 August 2010 10:24, Eugene Reimer <[email protected]> wrote:
> You could probably improve its ability to recognize "00" as two 0's by
> training it on such paired symbols.
>
> Mind you, I have also been surprised by cases where a perfectly clear and
> flawless symbol gets subdivided, like a N becoming |\| or an H becoming I-I,
> which indicates that tesseract has code to subdivide blobs other than based
> on there being "space" between them. However that code seems to behave in
> erratic ways.
Actually, on this image, I get:
Mobile (65) 81(1) 6(l)2
which is more or less the behaviour you're talking about; however, you
should bear in mind that what looks like a solid shape to you does not
necessarily look like a solid shape to the recogniser.
Some (possibly) related variables:
INT_VAR (repair_unchopped_blobs, 1, "Fix blobs that aren't chopped");
double_VAR(tessedit_certainty_threshold, -2.25, "Good blob limit");
BOOL_VAR(fragments_guide_chopper, FALSE,
"Use information from fragments to guide chopping process");
INT_VAR(segment_adjust_debug, 0,
"Segmentation adjustment debug");
BOOL_VAR(assume_fixed_pitch_char_segment, 0,
"include fixed-pitch heuristics in char segmentation");
BOOL_VAR(use_new_state_cost, 0,
"use new state cost heuristics for segmentation state evaluation");
double_VAR(heuristic_segcost_rating_base, 1.25,
"base factor for adding segmentation cost into word rating."
"It's a multiplying factor, the larger the value above 1, "
"the bigger the effect of segmentation cost.");
double_VAR(heuristic_weight_rating, 1,
"weight associated with char rating in combined cost of state");
double_VAR(heuristic_weight_width, 0,
"weight associated with width evidence in combined cost of state");
double_VAR(heuristic_weight_seamcut, 0,
"weight associated with seam cut in combined cost of state");
double_VAR(heuristic_max_char_wh_ratio, MAX_SQUAT,
"max char width-to-height ratio allowed in segmentation");
--
<Leftmost> jimregan, that's because deep inside you, you are evil.
<Leftmost> Also not-so-deep inside you.
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en.