I am the developer of the Namsel OCR project (https://www.namsel.com/) and 
can speak to a few different Tibetan OCR implementations. First, you may 
want look at tbrc.org and particularly their e-text section. We've OCR'd 
the entire Tibetan Tengyur and Kangyur as well as hundreds of thousands of 
additional pages of Tibetan literature and made it available for search 
there.

As already mentioned in this thread, there is also the Yakpo OCR project 
(http://www.dharmabook.ru/ocr/). The Google Drive/Google Docs and Google 
Books projects have recently added support for Tibetan OCR, although from 
what I understand it is still a work in progress. As far as I can tell, the 
Google Doc OCR service isn't presently meant for building large collections 
of OCR text, but can handle documents of a few pages.

Otherwise, you can try searching this email list for previous discussions 
on training Tesseract. For example, Tenzin Dendup has spent time attempting 
to train Tesseract on 
Tibetan/Dzongkha: 
https://groups.google.com/d/msg/tesseract-ocr/ONkAD2kuxUQ/EQsepM67D94J

On Tuesday, November 24, 2015 at 11:11:52 PM UTC-8, Yizhen Hai wrote:
>
> I am working on a volunteer project to digitize the Sutra and all related 
> materials, most of them in Tibetan. It will save a lot of time if I can use 
> some OCR technology in this process. However, there are hardly any software 
> available for Tibetan. 
> Therefore, I wonder how I can get help to use Tesseract for Tibetan. (I am 
> new on both OCR and Tesseract and the only programming language I know is 
> R.) I have no idea how to get started, training Tesseract for a new 
> language? Tibetan? And what if the image contains both Chinese and Tibetan? 
> Please give me some hints.
> Thanks a lot.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/0eb700b4-0d4e-4529-9e12-6d0df9e04941%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to