That's great news, Nick!  I can't wait to try it on the old Irish fonts!

-Galt

On Tuesday, July 3, 2012 9:44:27 AM UTC-7, Nick White wrote:
>
> On Fri, Jun 01, 2012 at 10:16:52AM +0100, Nick White wrote: 
> > On Wed, May 23, 2012 at 05:39:00PM +0100, Nick White wrote: 
> > > On Tue, May 22, 2012 at 05:21:23AM -0700, Galt wrote: 
> > > > On May 21, 2:04�am, Nick White <[email protected]> wrote: 
> > > > > I've been suffering a very similar problem with some of the text 
> I'm 
> > > > > training, which has several diacritics above and below glyphs. It 
> > > > > isn't infrequent to find quite a few lines of garbage which are 
> some 
> > > > > of the diacritics taking a line, which then causes the following 
> and 
> > > > > preceding lines to not include said diacritics. 
> > 
> > I wonder, is there any way of harnessing the Tesseract API or 
> > configuration options to affect line height and line detection? I 
> > can't seem to make the above problem go away. 
>
> I finally solved this problem for my case! I found the configuration 
> setting 'textord_min_linesize'. With this I can assure Tesseract 
> that lines the size of accents should never be considered, and the 
> problem goes away entirely. I set the value to 2.5, twice the 
> default, after trial-and-error. 
>
> Nick 
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to