Is TESSDATA _PREFIX variable set in the environment? If so, what is the directory, it is pointing to?
- excuse the brevity, sent from mobile On 01-Jan-2017 9:35 PM, "ShreeDevi Kumar" <shreesh...@gmail.com> wrote: > What about osd.traineddata and config files? Are they in your tessdata > directory? > > - excuse the brevity, sent from mobile > > On 01-Jan-2017 9:22 PM, <ruediger.k...@deutschebahn.com> wrote: > >> Hi all, >> >> I'm in a time critical situation. I want to deliver a new software for >> our customer on 5th January 2017. >> While things worked well on the test-environment; after deploying the >> software on the productive environment problems came up. >> Before describing the situation/failure in detail, some info about the >> setup and the environment. >> >> >> Environment & Installation >> >> *Operating System: Suse Enterprise Linux Server 12 SP 1* >> $ uname –a >> Linux 3.12.62-60.64.8-default #1 SMP Tue Oct 18 12:21:38 UTC 2016 >> (42e0a66) x86_64 x86_64 x86_64 GNU/Linux >> Since this environment is managed, I can not update any system libraries >> like glibc etc. >> *So the newest and only official supported version for "Suse 12 SP1 >> x86_64" of teaaseract I found is 3.02* >> >> *Installed Packages:* >> libgif4-4.1.6-34.1.1.x86_64.rpm >> liblept3-1.69-16.1.x86_64.rpm >> libtesseract3-3.02.02-3.2.1.x86_64.rpm >> libwebp4-0.3.1-34.1.x86_64.rpm >> tesseract-3.02.02-59.1.x86_64.rpm >> >> *tesseract version* >> $ tesseract –v >> tesseract 3.02.02 >> leptonica-1.69 >> libgif 4.1.6 : libjpeg 8d : libpng 1.5.22 : libtiff 4.0.6 : zlib >> 1.2.8 >> >> *Release details* >> $ zypper info tesseract >> Information for package tesseract: >> ---------------------------------- >> Repository: @System >> >> >> *Name: tesseractVersion: 3.02.02-59.1Arch: x86_64* >> Vendor: obs://build.opensuse.org/home:koprok >> Support Level: unknown >> Installed: Yes >> Status: up-to-date >> Installed Size: 3.8 MiB >> Summary: Open Source OCR Engine >> Description: […] >> >> >> Traindata & Languages >> >> *Traindata* >> The traindata has been manually downloaded from github >> <https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#data-files-for-version-302> >> . >> >> - https://sourceforge.net/projects/tesseract-ocr-alt/files/ >> tesseract-ocr-3.02.eng.tar.gz/download >> >> <https://sourceforge.net/projects/tesseract-ocr-alt/files/tesseract-ocr-3.02.eng.tar.gz/download> >> - https://sourceforge.net/projects/tesseract-ocr-alt/files/ >> tesseract-ocr-3.02.deu.tar.gz/download >> >> <https://sourceforge.net/projects/tesseract-ocr-alt/files/tesseract-ocr-3.02.deu.tar.gz/download> >> >> *And files have been to /usr/share/tessdata/* >> $ ls -la /usr/share/tessdata/ >> drwxr-xr-x 1 root root 230 Dec 31 16:37 configs/ >> -rw-r--r-- 1 root root 2438081 Dec 30 15:31 deu.traineddata >> -rw-r--r-- 1 root root 171918 Dec 30 20:16 eng.cube.bigrams >> -rw-r--r-- 1 root root 38 Dec 30 20:16 eng.cube.fold >> -rw-r--r-- 1 root root 181 Dec 30 20:16 eng.cube.lm >> -rw-r--r-- 1 root root 857304 Dec 30 20:16 eng.cube.nn >> -rw-r--r-- 1 root root 254 Dec 30 20:16 eng.cube.params >> -rw-r--r-- 1 root root 13020078 Dec 30 20:16 eng.cube.size >> -rw-r--r-- 1 root root 2444187 Dec 30 20:16 eng.cube.word-freq >> -rw-r--r-- 1 root root 996 Dec 30 20:16 eng.tesseract_cube.nn >> -rw-r--r-- 1 root root 21876572 Dec 30 20:16 eng.traineddata >> drwxr-xr-x 1 root root 88 Dec 31 16:37 tessconfigs/ >> >> *tesseract detects 'deu' and 'eng' as available languages* >> $ tesseract --list-langs >> List of available languages (2): >> deu >> eng >> >> >> Application & Problem >> >> *The software application is build upon Spring Boot framework* >> Runtime.getRuntime().exec(new String[] { >> "tesseract", >> "--tessdata-dir", "/usr/share/tessdata", >> "-l", lang.getISO3Language(), >> inputTiff.toAbsolutePath().toString(), extractedcntPath }); >> >> *The appication logfile says* >> 2016-12-30 20:30:02,320 [https-jsse-nio-8443-exec-7] WARN >> PDFContentExtractor - read_params_file: parameter not found: II* >> >> *Executing tesseract with tessdata dir fails* >> $ tesseract --tessdata-dir /usr/share/tessdata -l deu >> inputPdf6632237754781472255.tiff out4 >> read_params_file: parameter not found: II* >> >> *When executing tesseract with no tessdata dir works well* >> $ tesseract -l deu inputPdf6632237754781472255.tiff out5 >> Tesseract Open Source OCR Engine v3.02.02 with Leptonica >> >> >> Questions & Ideas >> Why does tesseract work well and detect the available languages without >> the --tessdata-dir parameter set? >> Why does teasseract crash during initialization when using the >> --tessdata-dir parameter set? >> Is there any difference between running tesseract with/without the >> --tessdata-dir >> parameter set? >> >> What can I do to fix this problem? >> Install a newer version of tesseract? >> Compile a version from sources? >> Use other traindata/tessdata? >> Run tesseract without the --tessdata-dir param? >> >> If anybody can help me getting this issue solved in the upcomming week, >> it would not only make me happy, but rather our whole team. >> >> Thank you very much in advance! >> Rüdiger Kurz >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-ocr+unsubscr...@googlegroups.com. >> To post to this group, send email to tesseract-ocr@googlegroups.com. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit https://groups.google.com/d/ms >> gid/tesseract-ocr/f046ae79-d687-45f8-af41-289cd84da2b9%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/f046ae79-d687-45f8-af41-289cd84da2b9%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduV6vOgge%2BD1FJyU4V6SmOEMzJiFZ9p-3ePLGyOhowS_fA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.