Hi Ruwanthaka, It's an interesting question. If the modifiers and constants are always on the same horizontal line (and not e.g. above the characters they affect), and they don't overlap with other characters, you should probably just treat them separately. This means you have fewer characters to train (as you don't have to train every possible combination separately.)
I know for me doing Ancient Greek I would ideally have liked to have Tesseract recognise the diacritics separately from the characters, but they were generally either above or below the character they modified, so I had to train each character with all combinations of diacritics. Hope this helps. Nick On Fri, Sep 21, 2012 at 03:58:45AM -0700, Ruran wrote: > > > I m training Sinhala pack for the tesseract; > > > Sinhala includes Modifiers / Vowels and constant fonts. > > While training we can flows two models. > > [comb]1) We can consider hole character as one (image 01) > > > > [sep]2) or Modifiers and constant separates (image 02) > > > > Both models might work, but to gaining higher accuracy which is the > best/prefer > model? > > > > regards > > Ruwanthaka > > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

