ShreeDevi,

Thanks for your answers and taking the time.

I get traineddata file for 3.04 version (file is little less, but number of 
characters is the same - 2851) and get the same result - some symbols is 
divided to pair (first is correct and another one is fail).
I think to upgrade to 4.00, so I have a questions: 

Can I install new version nearby with 3.05, without install ?

And another question in the first my post:
Did the tesseract have some limitations for number of bytes per character 
in unicode ?
Are there any additional parameters to remove limitations on the number of 
bytes per symbol ?

пятница, 25 августа 2017 г., 20:13:22 UTC+7 пользователь shree написал:
>
> If you are using the 4.0alpha - latest version of program you can use 
> kannada traineddata from 
>
> https://github.com/tesseract-ocr/tessdata/blob/master/best/kan.traineddata
> or
>
> https://github.com/tesseract-ocr/tessdata/blob/master/best/Kannada.traineddata
>
> I have not tested kannada personally but if it follows the pattern for 
> devanagari, it should be better than the older traineddata.
>
> If you are using 3.05 version of program,
> then use traineddata files from 
> https://github.com/tesseract-ocr/tessdata/releases/tag/3.04.00
>
> Please note that the unicharset and langdata files are used while training 
> and just changing the unicharset file is NOT going to improve the 
> recognition.
>
> For that training needs to be done. Please see the wiki for more details.
>
> ShreeDevi
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Fri, Aug 25, 2017 at 6:31 PM, Yury <[email protected] <javascript:>> 
> wrote:
>
>> Hello shree!
>>
>> Thanks for your links and taking the time.
>>
>> I don't found folder /best/ in ~alex-p profile.
>> But I found kan.traineddata in package tesseract-lang-4.00 (in 
>> tesseract-lang-3.05 the language Kannada is absent).
>> I have to got this file and start recognise - result is the same.
>> This package is dated at 08.01.17 and have 2851 characters (as I have).
>> So, I thing I used this package earlier.
>>
>> пятница, 25 августа 2017 г., 18:56:25 UTC+7 пользователь shree написал:
>>>
>>> https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr
>>>
>>> For ppa
>>>
>>> On 25-Aug-2017 5:22 PM, "ShreeDevi Kumar" <[email protected]> wrote:
>>>
>>>> Latest GitHub source in master branch is for 4.0alpha. you can install 
>>>> via post.
>>>>
>>>> Search for tesseract PPA Alex in Google.
>>>>
>>>> _sent from phone
>>>>
>>>> On 25-Aug-2017 4:42 PM, "Yury" <[email protected]> wrote:
>>>>
>>>>> Hello again.
>>>>>
>>>>> I found this: 
>>>>> https://github.com/tesseract-ocr/tessdata/blob/master/best/Kannada.traineddata
>>>>>
>>>>> But after recognition I see only english text symbols and digits, so 
>>>>> this did not work.
>>>>> In log I see:
>>>>>  theraysmith <https://github.com/theraysmith> Added best traineddatas 
>>>>> for 4.00 alpha 
>>>>> <https://github.com/tesseract-ocr/tessdata/commit/3a94ddd47be01fd897cbe31f05cbd2301454cf8a>
>>>>>
>>>>> I have 3.05.
>>>>>
>>>>>
>>>>> пятница, 25 августа 2017 г., 17:47:56 UTC+7 пользователь Yury написал:
>>>>>>
>>>>>> Hello, shree!
>>>>>>
>>>>>> Can you tell me exact path for tessdata/best/*.traineddata ?
>>>>>>
>>>>>> пятница, 25 августа 2017 г., 16:07:49 UTC+7 пользователь shree 
>>>>>> написал:
>>>>>>>
>>>>>>> Have you tried the new tessdata/best/*.traineddata with the latest 
>>>>>>> github sources?
>>>>>>>
>>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/b20f906b-db90-43f1-b9c6-b1bb40d21414%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/b20f906b-db90-43f1-b9c6-b1bb40d21414%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/b1d6b3c7-79b8-4308-9ac0-7ec1f4e3897c%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/b1d6b3c7-79b8-4308-9ac0-7ec1f4e3897c%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/9b9151da-f025-466a-8ac6-fe3003ad4d48%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to