Forgot to add. 
At the stage make under CygWin I could not execute a command "sudo 
ldconfig". 
Although I think that it is not essential - the modes 1 and 3 work fine.

понедельник, 11 сентября 2017 г., 12:08:32 UTC+7 пользователь Yury написал:
>
> Thanks for your hint. 
>
> I installed CygWin and compiled tesseract 4.0 under CygWin. Quality has 
> improved significantly. 
> However, there was another problem. 
> In oem mode 1 or 3 everything works fine. When I choose the modes 0 or 2 I 
> get the error: 
>
> Failed loading language 'kan'
> Tesseract couldn't load any languages!
> Could not initialize tesseract.
>
> I set TESSDATA_PREFIX to "/usr/share/tessdata". There are eng, kan, 
> Kannada and osr traineddata obtained from best catalog. 
> What could be the problem ? These modes do not work in version 4 ?
>
> tesseract 4.00.00alpha
>  leptonica-1.74.4
>   libgif 4.1.6(?) : libjpeg 8d (libjpeg-turbo 1.5.0) : libpng 1.6.30 : 
> libtiff 4.0.7 : zlib 1.2.11 : libwebp 0.4.4 : libopenjp2 2.1.2
>
>  Found AVX
>  Found SSE
>
> суббота, 26 августа 2017 г., 0:23:49 UTC+7 пользователь shree написал:
>>
>> I do not know about internal working of tesseract.
>>
>> If you unpack the best/kan.traineddata you may find a smaller unicharset 
>> which just the basic characters in it.
>>
>> Tesseract 4 uses the LSTM neural net engine vs the legacy engine for 
>> 3.05. LSTM does line based recognition rather than character base.
>>
>> Yes, it is possible to have both versions installed, however I do not 
>> have exact instructions to make it work. It would also depend on what o/s 
>> you are using.
>>
>> I only have the latest GitHub version installed.
>>
>> On 25-Aug-2017 9:46 PM, "Yury" <[email protected]> wrote:
>>
>>> ShreeDevi,
>>>
>>> Thanks for your answers and taking the time.
>>>
>>> I get traineddata file for 3.04 version (file is little less, but number 
>>> of characters is the same - 2851) and get the same result - some symbols is 
>>> divided to pair (first is correct and another one is fail).
>>> I think to upgrade to 4.00, so I have a questions: 
>>>
>>> Can I install new version nearby with 3.05, without install ?
>>>
>>> And another question in the first my post:
>>> Did the tesseract have some limitations for number of bytes per 
>>> character in unicode ?
>>> Are there any additional parameters to remove limitations on the number 
>>> of bytes per symbol ?
>>>
>>> пятница, 25 августа 2017 г., 20:13:22 UTC+7 пользователь shree написал:
>>>>
>>>> If you are using the 4.0alpha - latest version of program you can use 
>>>> kannada traineddata from 
>>>>
>>>>
>>>> https://github.com/tesseract-ocr/tessdata/blob/master/best/kan.traineddata
>>>> or
>>>>
>>>> https://github.com/tesseract-ocr/tessdata/blob/master/best/Kannada.traineddata
>>>>
>>>> I have not tested kannada personally but if it follows the pattern for 
>>>> devanagari, it should be better than the older traineddata.
>>>>
>>>> If you are using 3.05 version of program,
>>>> then use traineddata files from 
>>>> https://github.com/tesseract-ocr/tessdata/releases/tag/3.04.00
>>>>
>>>> Please note that the unicharset and langdata files are used while 
>>>> training and just changing the unicharset file is NOT going to improve the 
>>>> recognition.
>>>>
>>>> For that training needs to be done. Please see the wiki for more 
>>>> details.
>>>>
>>>> ShreeDevi
>>>> ____________________________________________________________
>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>
>>>> On Fri, Aug 25, 2017 at 6:31 PM, Yury <[email protected]> wrote:
>>>>
>>>>> Hello shree!
>>>>>
>>>>> Thanks for your links and taking the time.
>>>>>
>>>>> I don't found folder /best/ in ~alex-p profile.
>>>>> But I found kan.traineddata in package tesseract-lang-4.00 (in 
>>>>> tesseract-lang-3.05 the language Kannada is absent).
>>>>> I have to got this file and start recognise - result is the same.
>>>>> This package is dated at 08.01.17 and have 2851 characters (as I have).
>>>>> So, I thing I used this package earlier.
>>>>>
>>>>> пятница, 25 августа 2017 г., 18:56:25 UTC+7 пользователь shree написал:
>>>>>>
>>>>>> https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr
>>>>>>
>>>>>> For ppa
>>>>>>
>>>>>> On 25-Aug-2017 5:22 PM, "ShreeDevi Kumar" <[email protected]> wrote:
>>>>>>
>>>>>>> Latest GitHub source in master branch is for 4.0alpha. you can 
>>>>>>> install via post.
>>>>>>>
>>>>>>> Search for tesseract PPA Alex in Google.
>>>>>>>
>>>>>>> _sent from phone
>>>>>>>
>>>>>>> On 25-Aug-2017 4:42 PM, "Yury" <[email protected]> wrote:
>>>>>>>
>>>>>>>> Hello again.
>>>>>>>>
>>>>>>>> I found this: 
>>>>>>>> https://github.com/tesseract-ocr/tessdata/blob/master/best/Kannada.traineddata
>>>>>>>>
>>>>>>>> But after recognition I see only english text symbols and digits, 
>>>>>>>> so this did not work.
>>>>>>>> In log I see:
>>>>>>>>  theraysmith <https://github.com/theraysmith> Added best 
>>>>>>>> traineddatas for 4.00 alpha 
>>>>>>>> <https://github.com/tesseract-ocr/tessdata/commit/3a94ddd47be01fd897cbe31f05cbd2301454cf8a>
>>>>>>>>
>>>>>>>> I have 3.05.
>>>>>>>>
>>>>>>>>
>>>>>>>> пятница, 25 августа 2017 г., 17:47:56 UTC+7 пользователь Yury 
>>>>>>>> написал:
>>>>>>>>>
>>>>>>>>> Hello, shree!
>>>>>>>>>
>>>>>>>>> Can you tell me exact path for tessdata/best/*.traineddata ?
>>>>>>>>>
>>>>>>>>> пятница, 25 августа 2017 г., 16:07:49 UTC+7 пользователь shree 
>>>>>>>>> написал:
>>>>>>>>>>
>>>>>>>>>> Have you tried the new tessdata/best/*.traineddata with the 
>>>>>>>>>> latest github sources?
>>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>> send an email to [email protected].
>>>>>>>> To post to this group, send email to [email protected].
>>>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>>>>> To view this discussion on the web visit 
>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/b20f906b-db90-43f1-b9c6-b1bb40d21414%40googlegroups.com
>>>>>>>>  
>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/b20f906b-db90-43f1-b9c6-b1bb40d21414%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>
>>>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/b1d6b3c7-79b8-4308-9ac0-7ec1f4e3897c%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/b1d6b3c7-79b8-4308-9ac0-7ec1f4e3897c%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/9b9151da-f025-466a-8ac6-fe3003ad4d48%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/9b9151da-f025-466a-8ac6-fe3003ad4d48%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/52761114-36a2-40bd-a6b6-f5c2e2ac6c5b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to