[tesseract-ocr] Using --tessdata-dir param leads to read_params_file: parameter not found: II*

ruediger . kurz Sun, 01 Jan 2017 07:52:53 -0800

Hi all,

I'm in a time critical situation. I want to deliver a new software for our 
customer on 5th January 2017.
While things worked well on the test-environment; after deploying the 
software on the productive environment problems came up.
Before describing the situation/failure in detail, some info about the 
setup and the environment.



Environment & Installation

*Operating System: Suse Enterprise Linux Server 12 SP 1*
$ uname –a
Linux 3.12.62-60.64.8-default #1 SMP Tue Oct 18 12:21:38 UTC 2016 (42e0a66) 
x86_64 x86_64 x86_64 GNU/Linux
Since this environment is managed, I can not update any system libraries 
like glibc etc. 
*So the newest and only official supported version for "Suse 12 SP1 x86_64" 
of teaaseract I found is 3.02*

*Installed Packages:*
libgif4-4.1.6-34.1.1.x86_64.rpm
liblept3-1.69-16.1.x86_64.rpm
libtesseract3-3.02.02-3.2.1.x86_64.rpm
libwebp4-0.3.1-34.1.x86_64.rpm
tesseract-3.02.02-59.1.x86_64.rpm

*tesseract version*
$ tesseract –v
tesseract 3.02.02
    leptonica-1.69
        libgif 4.1.6 : libjpeg 8d : libpng 1.5.22 : libtiff 4.0.6 : zlib 1.2
.8

*Release details*
$ zypper info tesseract
Information for package tesseract:
----------------------------------
Repository: @System


*Name: tesseractVersion: 3.02.02-59.1Arch: x86_64*
Vendor: obs://build.opensuse.org/home:koprok
Support Level: unknown
Installed: Yes
Status: up-to-date
Installed Size: 3.8 MiB
Summary: Open Source OCR Engine
Description: […]


Traindata & Languages

*Traindata*
The traindata has been manually downloaded from github 
<https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#data-files-for-version-302>
.

   - 
   
https://sourceforge.net/projects/tesseract-ocr-alt/files/tesseract-ocr-3.02.eng.tar.gz/download
   - 
   
https://sourceforge.net/projects/tesseract-ocr-alt/files/tesseract-ocr-3.02.deu.tar.gz/download
   
*And files have been to /usr/share/tessdata/*
$ ls -la /usr/share/tessdata/
drwxr-xr-x 1 root root      230 Dec 31 16:37 configs/
-rw-r--r-- 1 root root  2438081 Dec 30 15:31 deu.traineddata
-rw-r--r-- 1 root root   171918 Dec 30 20:16 eng.cube.bigrams
-rw-r--r-- 1 root root       38 Dec 30 20:16 eng.cube.fold
-rw-r--r-- 1 root root      181 Dec 30 20:16 eng.cube.lm
-rw-r--r-- 1 root root   857304 Dec 30 20:16 eng.cube.nn
-rw-r--r-- 1 root root      254 Dec 30 20:16 eng.cube.params
-rw-r--r-- 1 root root 13020078 Dec 30 20:16 eng.cube.size
-rw-r--r-- 1 root root  2444187 Dec 30 20:16 eng.cube.word-freq
-rw-r--r-- 1 root root      996 Dec 30 20:16 eng.tesseract_cube.nn
-rw-r--r-- 1 root root 21876572 Dec 30 20:16 eng.traineddata
drwxr-xr-x 1 root root       88 Dec 31 16:37 tessconfigs/

*tesseract detects 'deu' and 'eng' as available languages*
$ tesseract --list-langs
List of available languages (2):
deu
eng


Application & Problem

*The software application is build upon Spring Boot framework*
Runtime.getRuntime().exec(new String[] { 
 "tesseract", 
 "--tessdata-dir", "/usr/share/tessdata", 
 "-l", lang.getISO3Language(), 
 inputTiff.toAbsolutePath().toString(), extractedcntPath });

*The appication logfile says*
2016-12-30 20:30:02,320 [https-jsse-nio-8443-exec-7] WARN  
PDFContentExtractor - read_params_file: parameter not found: II*

*Executing tesseract with tessdata dir fails*
$ tesseract --tessdata-dir /usr/share/tessdata -l deu 
inputPdf6632237754781472255.tiff out4
read_params_file: parameter not found: II*

*When executing tesseract with no tessdata dir works well*
$ tesseract -l deu inputPdf6632237754781472255.tiff out5
Tesseract Open Source OCR Engine v3.02.02 with Leptonica


Questions & Ideas
Why does tesseract work well and detect the available languages without the 
--tessdata-dir parameter set?
Why does teasseract crash during initialization when using the 
--tessdata-dir parameter set?
Is there any difference between running tesseract with/without the 
--tessdata-dir 
parameter set?

What can I do to fix this problem?
Install a newer version of tesseract?
Compile a version from sources?
Use other traindata/tessdata?
Run tesseract without the --tessdata-dir param?

If anybody can help me getting this issue solved in the upcomming week, it 
would not only make me happy, but rather our whole team.

Thank you very much in advance!
Rüdiger Kurz

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/06cbd0a6-3b6f-4288-b1e4-a780f9e8d4bc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Using --tessdata-dir param leads to read_params_file: parameter not found: II*

Reply via email to