Google has not provided images and box files for San.traineddata released for 3.04
I tried training using text2image with a combination of different fonts and training text. Results are at https://github.com/Shreeshrii/imagessan/tree/master/tessdata You can give these a try to see if recognition is any better. You can unpack any trained data file using -u option with combine-tessdata to see the config files etc. http://manpages.ubuntu.com/manpages/trusty/man1/combine_tessdata.1.html Use the dawg2wordlist to look at the various dictionary word lists used. http://manpages.ubuntu.com/manpages/trusty/man1/dawg2wordlist.1.html - sent from my phone. excuse the brevity. On 12-Jun-2016 11:26 am, "rohit saluja" <[email protected]> wrote: > Hey thanks for replying. > Which options to use with text2image command? Also, is there any > configuration file and fonts list? > > I tried the default option of text2image with tesseract github training > data with sanskrit 2003, but the recognition results are far away from > san.traineddata file on github. > Any help in matching san.traineddata results, starting from the scratch, > would be highly appreciated. > > Thanks in advance > Rohit > > On Friday, 6 May 2016 12:59:44 UTC+5:30, rohit saluja wrote: > >> Do we have Sanskrit training images and box files available online? >> >> Thanks >> Rohit >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/45767a89-cd11-4f39-9622-3fe7b4d20a4a%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/45767a89-cd11-4f39-9622-3fe7b4d20a4a%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXfqoY_BSW9BURAbj_AzdtRykK2ea5e9G2Suq9QCeWMOA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

