Google has not provided images and box files for San.traineddata released
for 3.04

I tried training using text2image with a combination of different fonts and
training text. Results are at
https://github.com/Shreeshrii/imagessan/tree/master/tessdata

You can give these a try to see if recognition is any better.

You can unpack any trained data file using -u option with combine-tessdata
to see the config files etc.

http://manpages.ubuntu.com/manpages/trusty/man1/combine_tessdata.1.html

Use the dawg2wordlist to look at the various dictionary word lists used.

http://manpages.ubuntu.com/manpages/trusty/man1/dawg2wordlist.1.html

- sent from my phone. excuse the brevity.
On 12-Jun-2016 11:26 am, "rohit saluja" <[email protected]> wrote:

> Hey thanks for replying.
> Which options to use with text2image command? Also, is there any
> configuration file and fonts list?
>
> I tried the default option of text2image with tesseract github training
> data with sanskrit 2003, but the recognition results are far away from
> san.traineddata file on github.
> Any help in matching san.traineddata results, starting from the scratch,
> would be highly appreciated.
>
> Thanks in advance
> Rohit
>
> On Friday, 6 May 2016 12:59:44 UTC+5:30, rohit saluja wrote:
>
>> Do we have Sanskrit training images and box files available online?
>>
>> Thanks
>> Rohit
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/45767a89-cd11-4f39-9622-3fe7b4d20a4a%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/45767a89-cd11-4f39-9622-3fe7b4d20a4a%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXfqoY_BSW9BURAbj_AzdtRykK2ea5e9G2Suq9QCeWMOA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to