What about osd.traineddata and config files? Are they in your tessdata directory?
- excuse the brevity, sent from mobile On 01-Jan-2017 9:22 PM, <ruediger.k...@deutschebahn.com> wrote: > Hi all, > > I'm in a time critical situation. I want to deliver a new software for our > customer on 5th January 2017. > While things worked well on the test-environment; after deploying the > software on the productive environment problems came up. > Before describing the situation/failure in detail, some info about the > setup and the environment. > > > Environment & Installation > > *Operating System: Suse Enterprise Linux Server 12 SP 1* > $ uname –a > Linux 3.12.62-60.64.8-default #1 SMP Tue Oct 18 12:21:38 UTC 2016 > (42e0a66) x86_64 x86_64 x86_64 GNU/Linux > Since this environment is managed, I can not update any system libraries > like glibc etc. > *So the newest and only official supported version for "Suse 12 SP1 > x86_64" of teaaseract I found is 3.02* > > *Installed Packages:* > libgif4-4.1.6-34.1.1.x86_64.rpm > liblept3-1.69-16.1.x86_64.rpm > libtesseract3-3.02.02-3.2.1.x86_64.rpm > libwebp4-0.3.1-34.1.x86_64.rpm > tesseract-3.02.02-59.1.x86_64.rpm > > *tesseract version* > $ tesseract –v > tesseract 3.02.02 > leptonica-1.69 > libgif 4.1.6 : libjpeg 8d : libpng 1.5.22 : libtiff 4.0.6 : zlib > 1.2.8 > > *Release details* > $ zypper info tesseract > Information for package tesseract: > ---------------------------------- > Repository: @System > > > *Name: tesseractVersion: 3.02.02-59.1Arch: x86_64* > Vendor: obs://build.opensuse.org/home:koprok > Support Level: unknown > Installed: Yes > Status: up-to-date > Installed Size: 3.8 MiB > Summary: Open Source OCR Engine > Description: […] > > > Traindata & Languages > > *Traindata* > The traindata has been manually downloaded from github > <https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#data-files-for-version-302> > . > > - https://sourceforge.net/projects/tesseract-ocr-alt/ > files/tesseract-ocr-3.02.eng.tar.gz/download > > <https://sourceforge.net/projects/tesseract-ocr-alt/files/tesseract-ocr-3.02.eng.tar.gz/download> > - https://sourceforge.net/projects/tesseract-ocr-alt/ > files/tesseract-ocr-3.02.deu.tar.gz/download > > <https://sourceforge.net/projects/tesseract-ocr-alt/files/tesseract-ocr-3.02.deu.tar.gz/download> > > *And files have been to /usr/share/tessdata/* > $ ls -la /usr/share/tessdata/ > drwxr-xr-x 1 root root 230 Dec 31 16:37 configs/ > -rw-r--r-- 1 root root 2438081 Dec 30 15:31 deu.traineddata > -rw-r--r-- 1 root root 171918 Dec 30 20:16 eng.cube.bigrams > -rw-r--r-- 1 root root 38 Dec 30 20:16 eng.cube.fold > -rw-r--r-- 1 root root 181 Dec 30 20:16 eng.cube.lm > -rw-r--r-- 1 root root 857304 Dec 30 20:16 eng.cube.nn > -rw-r--r-- 1 root root 254 Dec 30 20:16 eng.cube.params > -rw-r--r-- 1 root root 13020078 Dec 30 20:16 eng.cube.size > -rw-r--r-- 1 root root 2444187 Dec 30 20:16 eng.cube.word-freq > -rw-r--r-- 1 root root 996 Dec 30 20:16 eng.tesseract_cube.nn > -rw-r--r-- 1 root root 21876572 Dec 30 20:16 eng.traineddata > drwxr-xr-x 1 root root 88 Dec 31 16:37 tessconfigs/ > > *tesseract detects 'deu' and 'eng' as available languages* > $ tesseract --list-langs > List of available languages (2): > deu > eng > > > Application & Problem > > *The software application is build upon Spring Boot framework* > Runtime.getRuntime().exec(new String[] { > "tesseract", > "--tessdata-dir", "/usr/share/tessdata", > "-l", lang.getISO3Language(), > inputTiff.toAbsolutePath().toString(), extractedcntPath }); > > *The appication logfile says* > 2016-12-30 20:30:02,320 [https-jsse-nio-8443-exec-7] WARN > PDFContentExtractor - read_params_file: parameter not found: II* > > *Executing tesseract with tessdata dir fails* > $ tesseract --tessdata-dir /usr/share/tessdata -l deu > inputPdf6632237754781472255.tiff out4 > read_params_file: parameter not found: II* > > *When executing tesseract with no tessdata dir works well* > $ tesseract -l deu inputPdf6632237754781472255.tiff out5 > Tesseract Open Source OCR Engine v3.02.02 with Leptonica > > > Questions & Ideas > Why does tesseract work well and detect the available languages without > the --tessdata-dir parameter set? > Why does teasseract crash during initialization when using the > --tessdata-dir parameter set? > Is there any difference between running tesseract with/without the > --tessdata-dir > parameter set? > > What can I do to fix this problem? > Install a newer version of tesseract? > Compile a version from sources? > Use other traindata/tessdata? > Run tesseract without the --tessdata-dir param? > > If anybody can help me getting this issue solved in the upcomming week, it > would not only make me happy, but rather our whole team. > > Thank you very much in advance! > Rüdiger Kurz > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/f046ae79-d687-45f8-af41-289cd84da2b9% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/f046ae79-d687-45f8-af41-289cd84da2b9%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXFc1jZEb0L%2B0xV7FvzYsedP%2Bs1k5i7Ca8UPJDyiG9atA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.