Re: [tesseract-ocr] Re: Regarding Tesseract OCR engine for recognizing Tamil Fonts

Nick White Tue, 15 Jul 2014 13:08:43 -0700

On Mon, Jul 14, 2014 at 11:36:46AM -0700, Paul wrote:
> Am Montag, 14. Juli 2014 10:07:59 UTC+2 schrieb sibi kanagaraj:
>     But , I feel that Tamil Training is not sufficient and it 
>     could  be
>     streamlined . Hence I went to see if there are sufficient training
>     documents for Tamil . This search  landed me to this page . And
>     subsequently I found  " Things I would NOT recommend working on"  here .
> 
>     I am little bit stuck here . I wanted to do this project as part of my
>     Masters Degree . Isnt it that Tamil Training is independent module that
>     could be worked upon ?
>
> I'm not sure what's the case for Tamil, but in general the imagery for doing
> training is not available. So basically you would have to start all over.


Yes, that is the case, I'm afraid. There is a project that was 
hoping to create improved trainings for South Asian languages, but 
it hasn't been updated for quite a few years. See 
http://code.google.com/p/parichit/

Can you give us some clue as to what you think could be improved 
about the current Tamil recognition? Changes of configuration 
variables, or ambiguity rules (the unicharambigs file), don't need 
access to the training images.

Oh, by the way, the "Things I would NOT recommend working on" is a 
very old page (from 2010); I wouldn't take it too seriously...

Nick

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/20140715200759.GJ8807%40manta.lan.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: Regarding Tesseract OCR engine for recognizing Tamil Fonts

Reply via email to