I do not know about internal working of tesseract.

If you unpack the best/kan.traineddata you may find a smaller unicharset
which just the basic characters in it.

Tesseract 4 uses the LSTM neural net engine vs the legacy engine for 3.05.
LSTM does line based recognition rather than character base.

Yes, it is possible to have both versions installed, however I do not have
exact instructions to make it work. It would also depend on what o/s you
are using.

I only have the latest GitHub version installed.

On 25-Aug-2017 9:46 PM, "Yury" <yura...@gmail.com> wrote:

> ShreeDevi,
>
> Thanks for your answers and taking the time.
>
> I get traineddata file for 3.04 version (file is little less, but number
> of characters is the same - 2851) and get the same result - some symbols is
> divided to pair (first is correct and another one is fail).
> I think to upgrade to 4.00, so I have a questions:
>
> Can I install new version nearby with 3.05, without install ?
>
> And another question in the first my post:
> Did the tesseract have some limitations for number of bytes per character
> in unicode ?
> Are there any additional parameters to remove limitations on the number of
> bytes per symbol ?
>
> пятница, 25 августа 2017 г., 20:13:22 UTC+7 пользователь shree написал:
>>
>> If you are using the 4.0alpha - latest version of program you can use
>> kannada traineddata from
>>
>> https://github.com/tesseract-ocr/tessdata/blob/master/best/k
>> an.traineddata
>> or
>> https://github.com/tesseract-ocr/tessdata/blob/master/best/K
>> annada.traineddata
>>
>> I have not tested kannada personally but if it follows the pattern for
>> devanagari, it should be better than the older traineddata.
>>
>> If you are using 3.05 version of program,
>> then use traineddata files from
>> https://github.com/tesseract-ocr/tessdata/releases/tag/3.04.00
>>
>> Please note that the unicharset and langdata files are used while
>> training and just changing the unicharset file is NOT going to improve the
>> recognition.
>>
>> For that training needs to be done. Please see the wiki for more details.
>>
>> ShreeDevi
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Fri, Aug 25, 2017 at 6:31 PM, Yury <yur...@gmail.com> wrote:
>>
>>> Hello shree!
>>>
>>> Thanks for your links and taking the time.
>>>
>>> I don't found folder /best/ in ~alex-p profile.
>>> But I found kan.traineddata in package tesseract-lang-4.00 (in
>>> tesseract-lang-3.05 the language Kannada is absent).
>>> I have to got this file and start recognise - result is the same.
>>> This package is dated at 08.01.17 and have 2851 characters (as I have).
>>> So, I thing I used this package earlier.
>>>
>>> пятница, 25 августа 2017 г., 18:56:25 UTC+7 пользователь shree написал:
>>>>
>>>> https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr
>>>>
>>>> For ppa
>>>>
>>>> On 25-Aug-2017 5:22 PM, "ShreeDevi Kumar" <shree...@gmail.com> wrote:
>>>>
>>>>> Latest GitHub source in master branch is for 4.0alpha. you can install
>>>>> via post.
>>>>>
>>>>> Search for tesseract PPA Alex in Google.
>>>>>
>>>>> _sent from phone
>>>>>
>>>>> On 25-Aug-2017 4:42 PM, "Yury" <yur...@gmail.com> wrote:
>>>>>
>>>>>> Hello again.
>>>>>>
>>>>>> I found this: https://github.com/tesseract-ocr/tessdata/blob/master/
>>>>>> best/Kannada.traineddata
>>>>>>
>>>>>> But after recognition I see only english text symbols and digits, so
>>>>>> this did not work.
>>>>>> In log I see:
>>>>>>  theraysmith <https://github.com/theraysmith> Added best
>>>>>> traineddatas for 4.00 alpha
>>>>>> <https://github.com/tesseract-ocr/tessdata/commit/3a94ddd47be01fd897cbe31f05cbd2301454cf8a>
>>>>>>
>>>>>> I have 3.05.
>>>>>>
>>>>>>
>>>>>> пятница, 25 августа 2017 г., 17:47:56 UTC+7 пользователь Yury написал:
>>>>>>>
>>>>>>> Hello, shree!
>>>>>>>
>>>>>>> Can you tell me exact path for tessdata/best/*.traineddata ?
>>>>>>>
>>>>>>> пятница, 25 августа 2017 г., 16:07:49 UTC+7 пользователь shree
>>>>>>> написал:
>>>>>>>>
>>>>>>>> Have you tried the new tessdata/best/*.traineddata with the latest
>>>>>>>> github sources?
>>>>>>>>
>>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to tesseract-oc...@googlegroups.com.
>>>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/b20f906b-db9
>>>>>> 0-43f1-b9c6-b1bb40d21414%40googlegroups.com
>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/b20f906b-db90-43f1-b9c6-b1bb40d21414%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/b1d6b3c7-79b8-4308-9ac0-7ec1f4e3897c%40goo
>>> glegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/b1d6b3c7-79b8-4308-9ac0-7ec1f4e3897c%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/9b9151da-f025-466a-8ac6-fe3003ad4d48%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/9b9151da-f025-466a-8ac6-fe3003ad4d48%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduV_A71LZ_qeQH1hChed5y%2BJi_H2_w3GbF_hq-WWopzNeg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to