Hi all, I'm in a time critical situation. I want to deliver a new software for our customer on 5th January 2017. While things worked well on the test-environment; after deploying the software on the productive environment problems came up. Before describing the situation/failure in detail, some info about the setup and the environment.
Environment & Installation *Operating System: Suse Enterprise Linux Server 12 SP 1* $ uname –a Linux 3.12.62-60.64.8-default #1 SMP Tue Oct 18 12:21:38 UTC 2016 (42e0a66) x86_64 x86_64 x86_64 GNU/Linux Since this environment is managed, I can not update any system libraries like glibc etc. *So the newest and only official supported version for "Suse 12 SP1 x86_64" of teaaseract I found is 3.02* *Installed Packages:* libgif4-4.1.6-34.1.1.x86_64.rpm liblept3-1.69-16.1.x86_64.rpm libtesseract3-3.02.02-3.2.1.x86_64.rpm libwebp4-0.3.1-34.1.x86_64.rpm tesseract-3.02.02-59.1.x86_64.rpm *tesseract version* $ tesseract –v tesseract 3.02.02 leptonica-1.69 libgif 4.1.6 : libjpeg 8d : libpng 1.5.22 : libtiff 4.0.6 : zlib 1.2 .8 *Release details* $ zypper info tesseract Information for package tesseract: ---------------------------------- Repository: @System *Name: tesseractVersion: 3.02.02-59.1Arch: x86_64* Vendor: obs://build.opensuse.org/home:koprok Support Level: unknown Installed: Yes Status: up-to-date Installed Size: 3.8 MiB Summary: Open Source OCR Engine Description: […] Traindata & Languages *Traindata* The traindata has been manually downloaded from github <https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#data-files-for-version-302> . - https://sourceforge.net/projects/tesseract-ocr-alt/files/tesseract-ocr-3.02.eng.tar.gz/download - https://sourceforge.net/projects/tesseract-ocr-alt/files/tesseract-ocr-3.02.deu.tar.gz/download *And files have been to /usr/share/tessdata/* $ ls -la /usr/share/tessdata/ drwxr-xr-x 1 root root 230 Dec 31 16:37 configs/ -rw-r--r-- 1 root root 2438081 Dec 30 15:31 deu.traineddata -rw-r--r-- 1 root root 171918 Dec 30 20:16 eng.cube.bigrams -rw-r--r-- 1 root root 38 Dec 30 20:16 eng.cube.fold -rw-r--r-- 1 root root 181 Dec 30 20:16 eng.cube.lm -rw-r--r-- 1 root root 857304 Dec 30 20:16 eng.cube.nn -rw-r--r-- 1 root root 254 Dec 30 20:16 eng.cube.params -rw-r--r-- 1 root root 13020078 Dec 30 20:16 eng.cube.size -rw-r--r-- 1 root root 2444187 Dec 30 20:16 eng.cube.word-freq -rw-r--r-- 1 root root 996 Dec 30 20:16 eng.tesseract_cube.nn -rw-r--r-- 1 root root 21876572 Dec 30 20:16 eng.traineddata drwxr-xr-x 1 root root 88 Dec 31 16:37 tessconfigs/ *tesseract detects 'deu' and 'eng' as available languages* $ tesseract --list-langs List of available languages (2): deu eng Application & Problem *The software application is build upon Spring Boot framework* Runtime.getRuntime().exec(new String[] { "tesseract", "--tessdata-dir", "/usr/share/tessdata", "-l", lang.getISO3Language(), inputTiff.toAbsolutePath().toString(), extractedcntPath }); *The appication logfile says* 2016-12-30 20:30:02,320 [https-jsse-nio-8443-exec-7] WARN PDFContentExtractor - read_params_file: parameter not found: II* *Executing tesseract with tessdata dir fails* $ tesseract --tessdata-dir /usr/share/tessdata -l deu inputPdf6632237754781472255.tiff out4 read_params_file: parameter not found: II* *When executing tesseract with no tessdata dir works well* $ tesseract -l deu inputPdf6632237754781472255.tiff out5 Tesseract Open Source OCR Engine v3.02.02 with Leptonica Questions & Ideas Why does tesseract work well and detect the available languages without the --tessdata-dir parameter set? Why does teasseract crash during initialization when using the --tessdata-dir parameter set? Is there any difference between running tesseract with/without the --tessdata-dir parameter set? What can I do to fix this problem? Install a newer version of tesseract? Compile a version from sources? Use other traindata/tessdata? Run tesseract without the --tessdata-dir param? If anybody can help me getting this issue solved in the upcomming week, it would not only make me happy, but rather our whole team. Thank you very much in advance! Rüdiger Kurz -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/06cbd0a6-3b6f-4288-b1e4-a780f9e8d4bc%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.