Hi Sven,

I have actually got a pretty good word-dawg and freq-dawg, and I was
expecting more improvement than I saw when I added them. I even
increased the language_model_penalty_non_freq_dict_word and
language_model_penalty_non_dict_word configuration values a little
in my lang.config file, but I'm keen not to stray too far from the
default configuration if possible. Anyway, it seems not to have
helped significantly for me, sadly.

Thanks for the suggestion, though.

Nick 

On Thu, Jul 12, 2012 at 11:24:02AM -0500, Sven Pedersen wrote:
> Hi Nick,
> Have you done anything with dictionary entries? From previous reports
> it seems to me that having whole words would improve the chance of a
> correct choice.
> -_Sven
> 
> On Thu, Jul 12, 2012 at 9:50 AM, Nick White <[email protected]> wrote:
> > Hi folks,
> >
> > My Polytonic Greek training is going swimmingly, but there is one
> > obstacle which is still beating me. In many fonts alpha (α) looks
> > quite similar to omicron followed by iota (οι), and Tesseract gets
> > this wrong quite often. I added a rule to unicharambigs to suggest
> > that one may be replaced for the other, but it hasn't made much
> > difference, and both are quite common forms, so I can't use the
> > 'always replace' mode).
> >
> > Does anybody have any suggestions on how to improve this? Any
> > configuration variables that could be tweaked? The training already
> > is a good size, so I don't think more tif/box sets would help.
> >
> > Any thoughts would be much appreciated.
> >
> > Nick
> >
> > P.S. This is with Tesseract 3.02 (latest SVN), but v2 behaves
> > similarly.
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "tesseract-ocr" group.
> > To post to this group, send email to [email protected]
> > To unsubscribe from this group, send email to
> > [email protected]
> > For more options, visit this group at
> > http://groups.google.com/group/tesseract-ocr?hl=en
> 
> 
> 
> -- 
> ``All that is gold does not glitter,
>   not all those who wander are lost;
> the old that is strong does not wither,
>   deep roots are not reached by the frost.
> From the ashes a fire shall be woken,
>   a light from the shadows shall spring;
> renewed shall be blade that was broken,
>   the crownless again shall be king."
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to