Also see the san.config file in the langdata directory ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Wed, Sep 21, 2016 at 2:28 PM, ShreeDevi Kumar <shreesh...@gmail.com> wrote: > I had used the new text2image program with the tesstrain.sh utility for > generating the box/tiff pairs and trained on a large set of fonts - > > see > https://github.com/Shreeshrii/imagessan/blob/master/san95- > langdata/language-specific.sh > and > https://github.com/Shreeshrii/imagessan/blob/master/san95- > langdata/san.font_properties > > Additional unicharset entries are probably because different devanagari > fonts have different number of glyphs for conjuncts (specially Siddhanta, > Sanskrit2003, Chandas and Uttara). > > I do not currently have that setup (cygwin/msys2 with tesseract) to test. > > ShreeDevi > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Wed, Sep 21, 2016 at 12:54 PM, RKVS Raman <rkvsra...@gmail.com> wrote: > >> Hello Shridevi, >> >> Thanks for clarifying on the current status of cube. I has worked with >> tesseract long back. So I will leave the cube module for the time being and >> focus on using the old adaptive classifier. >> >> BTW I tried training https://github.com/Shreeshrii/imagessan/blob/master >> /san95-langdata/san.training_text on Noto Sans Devnagari and I could get >> only 1165 entries in unicharset. >> >> How did you manage to get 1645 entries with it? >> >> >> >> Best Regards >> -Raman >> >> >> >> On Wed, Sep 21, 2016 at 9:21 AM, ShreeDevi Kumar <shreesh...@gmail.com> >> wrote: >> >>> For Sanskrit, please see https://github.com/Shreeshrii/imagessan >>> >>> where I have added the training sources as well as traineddata for two >>> versions of training. In the testing I did on a small sample of images, it >>> seemed to perform better than the 3.04 san.traineddata. >>> >>> You are welcome to try using them and provide feedback. >>> >>> ShreeDevi >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >>> On Tue, Sep 20, 2016 at 7:05 PM, rkvsraman <rkvsra...@gmail.com> wrote: >>> >>>> Hello, >>>> >>>> In tessdata , I see cube models only for hindi and not for Marathi and >>>> Sanskrit thought they have the same script. >>>> >>>> Any particular reason for this? >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to tesseract-ocr+unsubscr...@googlegroups.com. >>>> To post to this group, send email to tesseract-ocr@googlegroups.com. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit https://groups.google.com/d/ms >>>> gid/tesseract-ocr/83d5c408-c869-4c16-8847-78ba2d250763%40goo >>>> glegroups.com >>>> <https://groups.google.com/d/msgid/tesseract-ocr/83d5c408-c869-4c16-8847-78ba2d250763%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-ocr+unsubscr...@googlegroups.com. >>> To post to this group, send email to tesseract-ocr@googlegroups.com. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit https://groups.google.com/d/ms >>> gid/tesseract-ocr/CAG2NduXv9SJy%3D9U3rvQAjyu1KVvM-GMiWCYc7F_ >>> 0V0ZB7AQkhA%40mail.gmail.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXv9SJy%3D9U3rvQAjyu1KVvM-GMiWCYc7F_0V0ZB7AQkhA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-ocr+unsubscr...@googlegroups.com. >> To post to this group, send email to tesseract-ocr@googlegroups.com. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit https://groups.google.com/d/ms >> gid/tesseract-ocr/CABFygUDpwgih4_YehFMNoF9B_WLRoN% >> 2BiB6YfxQu6rSbszvXzfA%40mail.gmail.com >> <https://groups.google.com/d/msgid/tesseract-ocr/CABFygUDpwgih4_YehFMNoF9B_WLRoN%2BiB6YfxQu6rSbszvXzfA%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWVhR2c%2BfP0kRaxb6DJ-HUFvix4Na8yFEe5rJFH%3D%2BxfvQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.