Looks like the reboot is resetting some variables - TESSDATA_PREFIX
environment variable

You can try giving the path in commandline. See the following batchfile as
a sample ..

---------
#Page Segmentation Modes
#3 = Fully automatic page segmentation, but no OSD. (Default)
#4 = Assume a single column of text of variable sizes.
#6 = Assume a single uniform block of text.
PSM=3
MYFILE=$1
LANG=$2
PDF=pdf
MYOUTPUTFILE=$MYFILE-merged

now=$(date +"%y%m%d-%H%M");
rm $MYOUTPUTFILE.txt
for f in *$MYFILE*.tif
do
  echo "Starting OCR for $f file with -l $LANG at $(date) , please wait..."
  tesseract  --tessdata-dir /home/shree/tesseract-ocr   $f $f-$LANG  -l
$LANG   -psm $PSM $PDF
  cat  $f-$LANG.txt>>$MYOUTPUTFILE.txt
done
echo "OCR done"

gswin32c -dPDFA -dBATCH -dNOPAUSE -sDEVICE=pdfwrite
-sProcessColorModel=DeviceCMYK  -sPDFACompatibilityPolicy=2
-sOutputFile=$MYOUTPUTFILE.pdf
*$MYFILE*-$LANG.pdf
echo "pdf merged"

---------



ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Sat, Jan 10, 2015 at 4:22 AM, C. <[email protected]> wrote:

> After rebooting the server tesseract complains  as follows:
>
> Error opening data file /usr/local/tesseract-ocr/tessdata/deu.traineddata
> Please make sure the TESSDATA_PREFIX environment variable is set to the
> parent directory of your "tessdata" directory.
> Failed loading language 'deu'
> Tesseract couldn't load any languages!
> Could not initialize tesseract.
>
> I manually copied  deu.traineddata to that folde and chmod'ed it to 777,
> but that just works until next reboots.
>
> I think I'll  give up soon with Tesseract and stay with OCR in Acrobat
> pro...
>
> Am Freitag, 9. Januar 2015 18:34:25 UTC+1 schrieb C.:
>
>> I did not succeed in completely reinstalling so I reinstalled the server
>> again and  installed just the latest version of tesseract from the
>> source.
>>
>> Now everything worked fine again "tesseracting": all lines are shown in
>> the resulting pdf-file. So it has to be a bug in tesseract 3.03.
>>
>> Hope that the latest version goes to to ubuntu-repos soon (cause I had
>> some problems after compiling with the TESSDATA_PREFIX thing).
>>
>> Am Freitag, 9. Januar 2015 13:16:03 UTC+1 schrieb shree:
>>>
>>> please see https://code.google.com/p/tesseract-ocr/issues/detail?id=1278
>>>
>>> ShreeDevi
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>> On Fri, Jan 9, 2015 at 5:44 PM, ShreeDevi Kumar <[email protected]>
>>> wrote:
>>>
>>>> you should *uninstall the old version fully* and then build the
>>>> version from git. It is possibly referring to some older libraries.
>>>>
>>>> Also, this needs leptonica 1.71. Not sure if the documentation mentions
>>>> it or not.
>>>>
>>>> ShreeDevi
>>>> ____________________________________________________________
>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>
>>>> On Fri, Jan 9, 2015 at 5:37 PM, C. <[email protected]> wrote:
>>>>
>>>>> I tried to compile the version you mentioned (after having installed
>>>>> the dependencies of the readme), but make stops with the following error:
>>>>>
>>>>> ./.libs/libtesseract.so: undefined reference to
>>>>> `l_generateCIDataForPdf'
>>>>> ./.libs/libtesseract.so: undefined reference to `l_CIDataDestroy'
>>>>> collect2: error: ld returned 1 exit status
>>>>> make[2]: *** [tesseract] Fehler 1
>>>>>
>>>>>
>>>>> Am Freitag, 9. Januar 2015 09:28:53 UTC+1 schrieb shree:
>>>>>>
>>>>>> As far as I know, pdf creation is a new addition and the issues were
>>>>>> ironed out only recently. There have been over 100 commits to the code
>>>>>> since 3.03 rc.
>>>>>>
>>>>>> If you want the new functionality, you can try compiling the code
>>>>>> from https://code.google.com/p/tesseract-ocr/source/checkout
>>>>>>
>>>>>> Instructions are at https://code.google.com/p/t
>>>>>> esseract-ocr/wiki/Compiling
>>>>>>
>>>>>> ShreeDevi
>>>>>> ____________________________________________________________
>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>
>>>>>> On Fri, Jan 9, 2015 at 1:53 PM, C. <[email protected]> wrote:
>>>>>>
>>>>>>> First of all: thanks for your help.
>>>>>>>
>>>>>>> Concerning my problem I did a complete reinstall of the Ubuntu
>>>>>>> 14.04-Server, installed tesseract 3.03 from the repos again and the 
>>>>>>> failure
>>>>>>> still exists ! As 3.03 does not seem to be that old, I did not and - to 
>>>>>>> be
>>>>>>> honest - do not want to install a newer version from github.
>>>>>>>
>>>>>>> Is this a know bug?
>>>>>>>
>>>>>>> Am Freitag, 9. Januar 2015 06:33:01 UTC+1 schrieb shree:
>>>>>>>>
>>>>>>>> I am using the git version -- output and messages attached. pdf
>>>>>>>> seems to have all the lines.
>>>>>>>>
>>>>>>>> User@HP ~/tesseract-ocr/testing
>>>>>>>> $ tesseract 5.tif 5 pdf
>>>>>>>> Tesseract Open Source OCR Engine v3.04.00 with Leptonica
>>>>>>>> Page 1
>>>>>>>> OSD: Weak margin (5.78), horiz textlines, not CJK: Don't rotate.
>>>>>>>> Page 2
>>>>>>>> Too few characters. Skipping this page
>>>>>>>> OSD: Weak margin (0.00) for 0 blob text block, but using
>>>>>>>> orientation anyway: 0
>>>>>>>> Empty page!!
>>>>>>>> Too few characters. Skipping this page
>>>>>>>> OSD: Weak margin (0.00) for 0 blob text block, but using
>>>>>>>> orientation anyway: 0
>>>>>>>> Empty page!!
>>>>>>>> Warning in pixReadMemTiff: tiff page 2 not found
>>>>>>>>
>>>>>>>> User@HP ~/tesseract-ocr/testing
>>>>>>>> $ tesseract -v
>>>>>>>> tesseract 3.04.00
>>>>>>>>  leptonica-1.71
>>>>>>>>   libgif 5.1.0 : libjpeg 8d : libpng 1.6.14 : libtiff 4.0.3 : zlib
>>>>>>>> 1.2.8 : libwebp 0.4.2
>>>>>>>>
>>>>>>>>
>>>>>>>> ShreeDevi
>>>>>>>> ____________________________________________________________
>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>>>
>>>>>>>> On Thu, Jan 8, 2015 at 9:24 PM, C. <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> sorry, meant: 5.pdf is the resulting file.
>>>>>>>>>
>>>>>>>>> Am Donnerstag, 8. Januar 2015 16:53:31 UTC+1 schrieb C.:
>>>>>>>>>
>>>>>>>>>> tesseract 3.03, example is attached (5.tif is the original, 5.tig
>>>>>>>>>> the result).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Am Donnerstag, 8. Januar 2015 16:02:31 UTC+1 schrieb shree:
>>>>>>>>>>>
>>>>>>>>>>> I don't think that's the supposed behavior. What version of
>>>>>>>>>>> tesseract are you using? Please post a sample image for testing?
>>>>>>>>>>>
>>>>>>>>>>> ShreeDevi
>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jan 8, 2015 at 8:00 PM, C. <[email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> If I do a simple "tesseract 1.tif 2 pdf ", all vertical and
>>>>>>>>>>>> horizontal lines (and grahics with small lines) in the source-file
>>>>>>>>>>>> dissapear in the resulting PDF-file (Ubuntu server 12.04, 
>>>>>>>>>>>> tesseract 3.03).
>>>>>>>>>>>>
>>>>>>>>>>>> Is that the supposed behavior?
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from
>>>>>>>>>>>> it, send an email to [email protected].
>>>>>>>>>>>> To post to this group, send email to [email protected]
>>>>>>>>>>>> .
>>>>>>>>>>>> Visit this group at http://groups.google.com/group
>>>>>>>>>>>> /tesseract-ocr.
>>>>>>>>>>>> To view this discussion on the web visit
>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/dcbb0e46-b29
>>>>>>>>>>>> b-447a-a5f4-d634b4371725%40googlegroups.com
>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/dcbb0e46-b29b-447a-a5f4-d634b4371725%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>> .
>>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  --
>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>> send an email to [email protected].
>>>>>>>>> To post to this group, send email to [email protected].
>>>>>>>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>>>>>>>> To view this discussion on the web visit
>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/6637bf0e-bf2
>>>>>>>>> 3-4ac8-a5bf-8add588ca9be%40googlegroups.com
>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/6637bf0e-bf23-4ac8-a5bf-8add588ca9be%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>> .
>>>>>>>>>
>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>
>>>>>>>>
>>>>>>>>  --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "tesseract-ocr" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to [email protected].
>>>>>>> To post to this group, send email to [email protected].
>>>>>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/3363264f-ba7
>>>>>>> e-41d7-a866-57a395d09755%40googlegroups.com
>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/3363264f-ba7e-41d7-a866-57a395d09755%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>> msgid/tesseract-ocr/e39afe04-6bcb-4b04-9697-a9e702440f37%
>>>>> 40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/e39afe04-6bcb-4b04-9697-a9e702440f37%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>
>>>  --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/a15c4b73-248f-4eca-acbc-1d9dfb7cc174%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/a15c4b73-248f-4eca-acbc-1d9dfb7cc174%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWcTn3tREwbODaYpJUf5YZpKONthCiNUJVqNm83t_QBPw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to