Forgot to add. At the stage make under CygWin I could not execute a command "sudo ldconfig". Although I think that it is not essential - the modes 1 and 3 work fine.
понедельник, 11 сентября 2017 г., 12:08:32 UTC+7 пользователь Yury написал: > > Thanks for your hint. > > I installed CygWin and compiled tesseract 4.0 under CygWin. Quality has > improved significantly. > However, there was another problem. > In oem mode 1 or 3 everything works fine. When I choose the modes 0 or 2 I > get the error: > > Failed loading language 'kan' > Tesseract couldn't load any languages! > Could not initialize tesseract. > > I set TESSDATA_PREFIX to "/usr/share/tessdata". There are eng, kan, > Kannada and osr traineddata obtained from best catalog. > What could be the problem ? These modes do not work in version 4 ? > > tesseract 4.00.00alpha > leptonica-1.74.4 > libgif 4.1.6(?) : libjpeg 8d (libjpeg-turbo 1.5.0) : libpng 1.6.30 : > libtiff 4.0.7 : zlib 1.2.11 : libwebp 0.4.4 : libopenjp2 2.1.2 > > Found AVX > Found SSE > > суббота, 26 августа 2017 г., 0:23:49 UTC+7 пользователь shree написал: >> >> I do not know about internal working of tesseract. >> >> If you unpack the best/kan.traineddata you may find a smaller unicharset >> which just the basic characters in it. >> >> Tesseract 4 uses the LSTM neural net engine vs the legacy engine for >> 3.05. LSTM does line based recognition rather than character base. >> >> Yes, it is possible to have both versions installed, however I do not >> have exact instructions to make it work. It would also depend on what o/s >> you are using. >> >> I only have the latest GitHub version installed. >> >> On 25-Aug-2017 9:46 PM, "Yury" <[email protected]> wrote: >> >>> ShreeDevi, >>> >>> Thanks for your answers and taking the time. >>> >>> I get traineddata file for 3.04 version (file is little less, but number >>> of characters is the same - 2851) and get the same result - some symbols is >>> divided to pair (first is correct and another one is fail). >>> I think to upgrade to 4.00, so I have a questions: >>> >>> Can I install new version nearby with 3.05, without install ? >>> >>> And another question in the first my post: >>> Did the tesseract have some limitations for number of bytes per >>> character in unicode ? >>> Are there any additional parameters to remove limitations on the number >>> of bytes per symbol ? >>> >>> пятница, 25 августа 2017 г., 20:13:22 UTC+7 пользователь shree написал: >>>> >>>> If you are using the 4.0alpha - latest version of program you can use >>>> kannada traineddata from >>>> >>>> >>>> https://github.com/tesseract-ocr/tessdata/blob/master/best/kan.traineddata >>>> or >>>> >>>> https://github.com/tesseract-ocr/tessdata/blob/master/best/Kannada.traineddata >>>> >>>> I have not tested kannada personally but if it follows the pattern for >>>> devanagari, it should be better than the older traineddata. >>>> >>>> If you are using 3.05 version of program, >>>> then use traineddata files from >>>> https://github.com/tesseract-ocr/tessdata/releases/tag/3.04.00 >>>> >>>> Please note that the unicharset and langdata files are used while >>>> training and just changing the unicharset file is NOT going to improve the >>>> recognition. >>>> >>>> For that training needs to be done. Please see the wiki for more >>>> details. >>>> >>>> ShreeDevi >>>> ____________________________________________________________ >>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>> >>>> On Fri, Aug 25, 2017 at 6:31 PM, Yury <[email protected]> wrote: >>>> >>>>> Hello shree! >>>>> >>>>> Thanks for your links and taking the time. >>>>> >>>>> I don't found folder /best/ in ~alex-p profile. >>>>> But I found kan.traineddata in package tesseract-lang-4.00 (in >>>>> tesseract-lang-3.05 the language Kannada is absent). >>>>> I have to got this file and start recognise - result is the same. >>>>> This package is dated at 08.01.17 and have 2851 characters (as I have). >>>>> So, I thing I used this package earlier. >>>>> >>>>> пятница, 25 августа 2017 г., 18:56:25 UTC+7 пользователь shree написал: >>>>>> >>>>>> https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr >>>>>> >>>>>> For ppa >>>>>> >>>>>> On 25-Aug-2017 5:22 PM, "ShreeDevi Kumar" <[email protected]> wrote: >>>>>> >>>>>>> Latest GitHub source in master branch is for 4.0alpha. you can >>>>>>> install via post. >>>>>>> >>>>>>> Search for tesseract PPA Alex in Google. >>>>>>> >>>>>>> _sent from phone >>>>>>> >>>>>>> On 25-Aug-2017 4:42 PM, "Yury" <[email protected]> wrote: >>>>>>> >>>>>>>> Hello again. >>>>>>>> >>>>>>>> I found this: >>>>>>>> https://github.com/tesseract-ocr/tessdata/blob/master/best/Kannada.traineddata >>>>>>>> >>>>>>>> But after recognition I see only english text symbols and digits, >>>>>>>> so this did not work. >>>>>>>> In log I see: >>>>>>>> theraysmith <https://github.com/theraysmith> Added best >>>>>>>> traineddatas for 4.00 alpha >>>>>>>> <https://github.com/tesseract-ocr/tessdata/commit/3a94ddd47be01fd897cbe31f05cbd2301454cf8a> >>>>>>>> >>>>>>>> I have 3.05. >>>>>>>> >>>>>>>> >>>>>>>> пятница, 25 августа 2017 г., 17:47:56 UTC+7 пользователь Yury >>>>>>>> написал: >>>>>>>>> >>>>>>>>> Hello, shree! >>>>>>>>> >>>>>>>>> Can you tell me exact path for tessdata/best/*.traineddata ? >>>>>>>>> >>>>>>>>> пятница, 25 августа 2017 г., 16:07:49 UTC+7 пользователь shree >>>>>>>>> написал: >>>>>>>>>> >>>>>>>>>> Have you tried the new tessdata/best/*.traineddata with the >>>>>>>>>> latest github sources? >>>>>>>>>> >>>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "tesseract-ocr" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to [email protected]. >>>>>>>> To post to this group, send email to [email protected]. >>>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>>>>> To view this discussion on the web visit >>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/b20f906b-db90-43f1-b9c6-b1bb40d21414%40googlegroups.com >>>>>>>> >>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/b20f906b-db90-43f1-b9c6-b1bb40d21414%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>> . >>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>> >>>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/b1d6b3c7-79b8-4308-9ac0-7ec1f4e3897c%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/b1d6b3c7-79b8-4308-9ac0-7ec1f4e3897c%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/9b9151da-f025-466a-8ac6-fe3003ad4d48%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/9b9151da-f025-466a-8ac6-fe3003ad4d48%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/52761114-36a2-40bd-a6b6-f5c2e2ac6c5b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

