Re: Training fonts > best practices

Nick White Fri, 21 Sep 2012 06:28:14 -0700

Hi Ruwanthaka,

It's an interesting question. If the modifiers and constants are
always on the same horizontal line (and not e.g. above the
characters they affect), and they don't overlap with other
characters, you should probably just treat them separately. This
means you have fewer characters to train (as you don't have to train
every possible combination separately.)


I know for me doing Ancient Greek I would ideally have liked to have
Tesseract recognise the diacritics separately from the characters,
but they were generally either above or below the character they
modified, so I had to train each character with all combinations of
diacritics.

Hope this helps.

Nick

On Fri, Sep 21, 2012 at 03:58:45AM -0700, Ruran wrote:
>  
> 
> I m training Sinhala pack for the tesseract; 
> 
> 
> Sinhala includes    Modifiers  /  Vowels and  constant fonts.
> 
> While training we can flows two models.
> 
> [comb]1) We can consider hole character as one (image 01)
> 
> 
> 
> [sep]2) or Modifiers and constant separates (image 02)
> 
> 
> 
> Both models might work, but to gaining higher accuracy which is the 
> best/prefer
> model? 
> 
> 
> 
> regards
> 
> Ruwanthaka
> 
> 
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Training fonts > best practices

Reply via email to