ShreeDevi, Thanks for your answers and taking the time.
I get traineddata file for 3.04 version (file is little less, but number of characters is the same - 2851) and get the same result - some symbols is divided to pair (first is correct and another one is fail). I think to upgrade to 4.00, so I have a questions: Can I install new version nearby with 3.05, without install ? And another question in the first my post: Did the tesseract have some limitations for number of bytes per character in unicode ? Are there any additional parameters to remove limitations on the number of bytes per symbol ? пятница, 25 августа 2017 г., 20:13:22 UTC+7 пользователь shree написал: > > If you are using the 4.0alpha - latest version of program you can use > kannada traineddata from > > https://github.com/tesseract-ocr/tessdata/blob/master/best/kan.traineddata > or > > https://github.com/tesseract-ocr/tessdata/blob/master/best/Kannada.traineddata > > I have not tested kannada personally but if it follows the pattern for > devanagari, it should be better than the older traineddata. > > If you are using 3.05 version of program, > then use traineddata files from > https://github.com/tesseract-ocr/tessdata/releases/tag/3.04.00 > > Please note that the unicharset and langdata files are used while training > and just changing the unicharset file is NOT going to improve the > recognition. > > For that training needs to be done. Please see the wiki for more details. > > ShreeDevi > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Fri, Aug 25, 2017 at 6:31 PM, Yury <[email protected] <javascript:>> > wrote: > >> Hello shree! >> >> Thanks for your links and taking the time. >> >> I don't found folder /best/ in ~alex-p profile. >> But I found kan.traineddata in package tesseract-lang-4.00 (in >> tesseract-lang-3.05 the language Kannada is absent). >> I have to got this file and start recognise - result is the same. >> This package is dated at 08.01.17 and have 2851 characters (as I have). >> So, I thing I used this package earlier. >> >> пятница, 25 августа 2017 г., 18:56:25 UTC+7 пользователь shree написал: >>> >>> https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr >>> >>> For ppa >>> >>> On 25-Aug-2017 5:22 PM, "ShreeDevi Kumar" <[email protected]> wrote: >>> >>>> Latest GitHub source in master branch is for 4.0alpha. you can install >>>> via post. >>>> >>>> Search for tesseract PPA Alex in Google. >>>> >>>> _sent from phone >>>> >>>> On 25-Aug-2017 4:42 PM, "Yury" <[email protected]> wrote: >>>> >>>>> Hello again. >>>>> >>>>> I found this: >>>>> https://github.com/tesseract-ocr/tessdata/blob/master/best/Kannada.traineddata >>>>> >>>>> But after recognition I see only english text symbols and digits, so >>>>> this did not work. >>>>> In log I see: >>>>> theraysmith <https://github.com/theraysmith> Added best traineddatas >>>>> for 4.00 alpha >>>>> <https://github.com/tesseract-ocr/tessdata/commit/3a94ddd47be01fd897cbe31f05cbd2301454cf8a> >>>>> >>>>> I have 3.05. >>>>> >>>>> >>>>> пятница, 25 августа 2017 г., 17:47:56 UTC+7 пользователь Yury написал: >>>>>> >>>>>> Hello, shree! >>>>>> >>>>>> Can you tell me exact path for tessdata/best/*.traineddata ? >>>>>> >>>>>> пятница, 25 августа 2017 г., 16:07:49 UTC+7 пользователь shree >>>>>> написал: >>>>>>> >>>>>>> Have you tried the new tessdata/best/*.traineddata with the latest >>>>>>> github sources? >>>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/b20f906b-db90-43f1-b9c6-b1bb40d21414%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/b20f906b-db90-43f1-b9c6-b1bb40d21414%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/b1d6b3c7-79b8-4308-9ac0-7ec1f4e3897c%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/b1d6b3c7-79b8-4308-9ac0-7ec1f4e3897c%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9b9151da-f025-466a-8ac6-fe3003ad4d48%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

