Hi Nick,

Thanks a lot! I am not sure if Tesseract has already supported the Tibetan 
language. That is why I asked. :)
I will get started with your and sriranga's suggestion and see how far I 
can get.

Yizhen


在 2015年11月25日星期三 UTC+8下午6:12:06,Nick White写道:
>
> Hi Yizhen, 
>
> On Tue, Nov 24, 2015 at 07:08:24PM -0800, Yizhen Hai wrote: 
> > I am working on a volunteer project to digitize the Sutra and all 
> related 
> > materials, most of them in Tibetan. 
>
> Sounds like a great project :) 
>
> > Therefore, I wonder how I can get help to use Tesseract for Tibetan. (I 
> am new 
> > on both OCR and Tesseract and the only programming language I know is 
> R.) I 
> > have no idea how to get started, training Tesseract for a new language? 
>
> Are you sure Tesseract doesn't already support the Tibetan language 
> you need? I know almost nothing about Tibetan, but I see in the 
> langdata[0] repository (which is used to build the official training 
> files) a Tibetan.unicharset file, which implies it probably does 
> have support. Take a look for the ISO-693 code for the language(s) 
> you're interested in in the tessdata repository[1]. 
>
> I quickly compared the ISO-693 codes from this wikipedia page[2] 
> with the tessdata and bod (Lhasa Tibetan) is the only one there that 
> I see available. But maybe it's the language you want anyway? 
>
> > And what if the image contains both Chinese and Tibetan? Please 
> > give me some hints. 
>
> Tesseract can be told to expect multiple languages in an image, 
> using a plus in the language argument (i.e. '-l eng+spa'). 
>
> Hope that's helpful. 
>
> Nick 
>
> 0. https://github.com/tesseract-ocr/langdata 
> 1. https://github.com/tesseract-ocr/tessdata 
> 2. https://en.wikipedia.org/wiki/Central_Tibetan_language 
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/c62d3c0b-c9a7-4bf0-9e2e-ca51b778fca2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to