Is TESSDATA _PREFIX variable set in the environment? If so, what is the
directory, it is pointing to?

- excuse the brevity, sent from mobile

On 01-Jan-2017 9:35 PM, "ShreeDevi Kumar" <shreesh...@gmail.com> wrote:

> What about osd.traineddata and config files? Are they in your tessdata
> directory?
>
> - excuse the brevity, sent from mobile
>
> On 01-Jan-2017 9:22 PM, <ruediger.k...@deutschebahn.com> wrote:
>
>> Hi all,
>>
>> I'm in a time critical situation. I want to deliver a new software for
>> our customer on 5th January 2017.
>> While things worked well on the test-environment; after deploying the
>> software on the productive environment problems came up.
>> Before describing the situation/failure in detail, some info about the
>> setup and the environment.
>>
>>
>> Environment & Installation
>>
>> *Operating System: Suse Enterprise Linux Server 12 SP 1*
>> $ uname –a
>> Linux 3.12.62-60.64.8-default #1 SMP Tue Oct 18 12:21:38 UTC 2016
>> (42e0a66) x86_64 x86_64 x86_64 GNU/Linux
>> Since this environment is managed, I can not update any system libraries
>> like glibc etc.
>> *So the newest and only official supported version for "Suse 12 SP1
>> x86_64" of teaaseract I found is 3.02*
>>
>> *Installed Packages:*
>> libgif4-4.1.6-34.1.1.x86_64.rpm
>> liblept3-1.69-16.1.x86_64.rpm
>> libtesseract3-3.02.02-3.2.1.x86_64.rpm
>> libwebp4-0.3.1-34.1.x86_64.rpm
>> tesseract-3.02.02-59.1.x86_64.rpm
>>
>> *tesseract version*
>> $ tesseract –v
>> tesseract 3.02.02
>>     leptonica-1.69
>>         libgif 4.1.6 : libjpeg 8d : libpng 1.5.22 : libtiff 4.0.6 : zlib
>> 1.2.8
>>
>> *Release details*
>> $ zypper info tesseract
>> Information for package tesseract:
>> ----------------------------------
>> Repository: @System
>>
>>
>> *Name: tesseractVersion: 3.02.02-59.1Arch: x86_64*
>> Vendor: obs://build.opensuse.org/home:koprok
>> Support Level: unknown
>> Installed: Yes
>> Status: up-to-date
>> Installed Size: 3.8 MiB
>> Summary: Open Source OCR Engine
>> Description: […]
>>
>>
>> Traindata & Languages
>>
>> *Traindata*
>> The traindata has been manually downloaded from github
>> <https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#data-files-for-version-302>
>> .
>>
>>    - https://sourceforge.net/projects/tesseract-ocr-alt/files/
>>    tesseract-ocr-3.02.eng.tar.gz/download
>>    
>> <https://sourceforge.net/projects/tesseract-ocr-alt/files/tesseract-ocr-3.02.eng.tar.gz/download>
>>    - https://sourceforge.net/projects/tesseract-ocr-alt/files/
>>    tesseract-ocr-3.02.deu.tar.gz/download
>>    
>> <https://sourceforge.net/projects/tesseract-ocr-alt/files/tesseract-ocr-3.02.deu.tar.gz/download>
>>
>> *And files have been to /usr/share/tessdata/*
>> $ ls -la /usr/share/tessdata/
>> drwxr-xr-x 1 root root      230 Dec 31 16:37 configs/
>> -rw-r--r-- 1 root root  2438081 Dec 30 15:31 deu.traineddata
>> -rw-r--r-- 1 root root   171918 Dec 30 20:16 eng.cube.bigrams
>> -rw-r--r-- 1 root root       38 Dec 30 20:16 eng.cube.fold
>> -rw-r--r-- 1 root root      181 Dec 30 20:16 eng.cube.lm
>> -rw-r--r-- 1 root root   857304 Dec 30 20:16 eng.cube.nn
>> -rw-r--r-- 1 root root      254 Dec 30 20:16 eng.cube.params
>> -rw-r--r-- 1 root root 13020078 Dec 30 20:16 eng.cube.size
>> -rw-r--r-- 1 root root  2444187 Dec 30 20:16 eng.cube.word-freq
>> -rw-r--r-- 1 root root      996 Dec 30 20:16 eng.tesseract_cube.nn
>> -rw-r--r-- 1 root root 21876572 Dec 30 20:16 eng.traineddata
>> drwxr-xr-x 1 root root       88 Dec 31 16:37 tessconfigs/
>>
>> *tesseract detects 'deu' and 'eng' as available languages*
>> $ tesseract --list-langs
>> List of available languages (2):
>> deu
>> eng
>>
>>
>> Application & Problem
>>
>> *The software application is build upon Spring Boot framework*
>> Runtime.getRuntime().exec(new String[] {
>>  "tesseract",
>>  "--tessdata-dir", "/usr/share/tessdata",
>>  "-l", lang.getISO3Language(),
>>  inputTiff.toAbsolutePath().toString(), extractedcntPath });
>>
>> *The appication logfile says*
>> 2016-12-30 20:30:02,320 [https-jsse-nio-8443-exec-7] WARN
>> PDFContentExtractor - read_params_file: parameter not found: II*
>>
>> *Executing tesseract with tessdata dir fails*
>> $ tesseract --tessdata-dir /usr/share/tessdata -l deu
>> inputPdf6632237754781472255.tiff out4
>> read_params_file: parameter not found: II*
>>
>> *When executing tesseract with no tessdata dir works well*
>> $ tesseract -l deu inputPdf6632237754781472255.tiff out5
>> Tesseract Open Source OCR Engine v3.02.02 with Leptonica
>>
>>
>> Questions & Ideas
>> Why does tesseract work well and detect the available languages without
>> the --tessdata-dir parameter set?
>> Why does teasseract crash during initialization when using the
>> --tessdata-dir parameter set?
>> Is there any difference between running tesseract with/without the 
>> --tessdata-dir
>> parameter set?
>>
>> What can I do to fix this problem?
>> Install a newer version of tesseract?
>> Compile a version from sources?
>> Use other traindata/tessdata?
>> Run tesseract without the --tessdata-dir param?
>>
>> If anybody can help me getting this issue solved in the upcomming week,
>> it would not only make me happy, but rather our whole team.
>>
>> Thank you very much in advance!
>> Rüdiger Kurz
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit https://groups.google.com/d/ms
>> gid/tesseract-ocr/f046ae79-d687-45f8-af41-289cd84da2b9%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/f046ae79-d687-45f8-af41-289cd84da2b9%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduV6vOgge%2BD1FJyU4V6SmOEMzJiFZ9p-3ePLGyOhowS_fA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to