Looks like the reboot is resetting some variables - TESSDATA_PREFIX environment variable
You can try giving the path in commandline. See the following batchfile as a sample .. --------- #Page Segmentation Modes #3 = Fully automatic page segmentation, but no OSD. (Default) #4 = Assume a single column of text of variable sizes. #6 = Assume a single uniform block of text. PSM=3 MYFILE=$1 LANG=$2 PDF=pdf MYOUTPUTFILE=$MYFILE-merged now=$(date +"%y%m%d-%H%M"); rm $MYOUTPUTFILE.txt for f in *$MYFILE*.tif do echo "Starting OCR for $f file with -l $LANG at $(date) , please wait..." tesseract --tessdata-dir /home/shree/tesseract-ocr $f $f-$LANG -l $LANG -psm $PSM $PDF cat $f-$LANG.txt>>$MYOUTPUTFILE.txt done echo "OCR done" gswin32c -dPDFA -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sProcessColorModel=DeviceCMYK -sPDFACompatibilityPolicy=2 -sOutputFile=$MYOUTPUTFILE.pdf *$MYFILE*-$LANG.pdf echo "pdf merged" --------- ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Jan 10, 2015 at 4:22 AM, C. <[email protected]> wrote: > After rebooting the server tesseract complains as follows: > > Error opening data file /usr/local/tesseract-ocr/tessdata/deu.traineddata > Please make sure the TESSDATA_PREFIX environment variable is set to the > parent directory of your "tessdata" directory. > Failed loading language 'deu' > Tesseract couldn't load any languages! > Could not initialize tesseract. > > I manually copied deu.traineddata to that folde and chmod'ed it to 777, > but that just works until next reboots. > > I think I'll give up soon with Tesseract and stay with OCR in Acrobat > pro... > > Am Freitag, 9. Januar 2015 18:34:25 UTC+1 schrieb C.: > >> I did not succeed in completely reinstalling so I reinstalled the server >> again and installed just the latest version of tesseract from the >> source. >> >> Now everything worked fine again "tesseracting": all lines are shown in >> the resulting pdf-file. So it has to be a bug in tesseract 3.03. >> >> Hope that the latest version goes to to ubuntu-repos soon (cause I had >> some problems after compiling with the TESSDATA_PREFIX thing). >> >> Am Freitag, 9. Januar 2015 13:16:03 UTC+1 schrieb shree: >>> >>> please see https://code.google.com/p/tesseract-ocr/issues/detail?id=1278 >>> >>> ShreeDevi >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >>> On Fri, Jan 9, 2015 at 5:44 PM, ShreeDevi Kumar <[email protected]> >>> wrote: >>> >>>> you should *uninstall the old version fully* and then build the >>>> version from git. It is possibly referring to some older libraries. >>>> >>>> Also, this needs leptonica 1.71. Not sure if the documentation mentions >>>> it or not. >>>> >>>> ShreeDevi >>>> ____________________________________________________________ >>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>> >>>> On Fri, Jan 9, 2015 at 5:37 PM, C. <[email protected]> wrote: >>>> >>>>> I tried to compile the version you mentioned (after having installed >>>>> the dependencies of the readme), but make stops with the following error: >>>>> >>>>> ./.libs/libtesseract.so: undefined reference to >>>>> `l_generateCIDataForPdf' >>>>> ./.libs/libtesseract.so: undefined reference to `l_CIDataDestroy' >>>>> collect2: error: ld returned 1 exit status >>>>> make[2]: *** [tesseract] Fehler 1 >>>>> >>>>> >>>>> Am Freitag, 9. Januar 2015 09:28:53 UTC+1 schrieb shree: >>>>>> >>>>>> As far as I know, pdf creation is a new addition and the issues were >>>>>> ironed out only recently. There have been over 100 commits to the code >>>>>> since 3.03 rc. >>>>>> >>>>>> If you want the new functionality, you can try compiling the code >>>>>> from https://code.google.com/p/tesseract-ocr/source/checkout >>>>>> >>>>>> Instructions are at https://code.google.com/p/t >>>>>> esseract-ocr/wiki/Compiling >>>>>> >>>>>> ShreeDevi >>>>>> ____________________________________________________________ >>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>> >>>>>> On Fri, Jan 9, 2015 at 1:53 PM, C. <[email protected]> wrote: >>>>>> >>>>>>> First of all: thanks for your help. >>>>>>> >>>>>>> Concerning my problem I did a complete reinstall of the Ubuntu >>>>>>> 14.04-Server, installed tesseract 3.03 from the repos again and the >>>>>>> failure >>>>>>> still exists ! As 3.03 does not seem to be that old, I did not and - to >>>>>>> be >>>>>>> honest - do not want to install a newer version from github. >>>>>>> >>>>>>> Is this a know bug? >>>>>>> >>>>>>> Am Freitag, 9. Januar 2015 06:33:01 UTC+1 schrieb shree: >>>>>>>> >>>>>>>> I am using the git version -- output and messages attached. pdf >>>>>>>> seems to have all the lines. >>>>>>>> >>>>>>>> User@HP ~/tesseract-ocr/testing >>>>>>>> $ tesseract 5.tif 5 pdf >>>>>>>> Tesseract Open Source OCR Engine v3.04.00 with Leptonica >>>>>>>> Page 1 >>>>>>>> OSD: Weak margin (5.78), horiz textlines, not CJK: Don't rotate. >>>>>>>> Page 2 >>>>>>>> Too few characters. Skipping this page >>>>>>>> OSD: Weak margin (0.00) for 0 blob text block, but using >>>>>>>> orientation anyway: 0 >>>>>>>> Empty page!! >>>>>>>> Too few characters. Skipping this page >>>>>>>> OSD: Weak margin (0.00) for 0 blob text block, but using >>>>>>>> orientation anyway: 0 >>>>>>>> Empty page!! >>>>>>>> Warning in pixReadMemTiff: tiff page 2 not found >>>>>>>> >>>>>>>> User@HP ~/tesseract-ocr/testing >>>>>>>> $ tesseract -v >>>>>>>> tesseract 3.04.00 >>>>>>>> leptonica-1.71 >>>>>>>> libgif 5.1.0 : libjpeg 8d : libpng 1.6.14 : libtiff 4.0.3 : zlib >>>>>>>> 1.2.8 : libwebp 0.4.2 >>>>>>>> >>>>>>>> >>>>>>>> ShreeDevi >>>>>>>> ____________________________________________________________ >>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>>>> >>>>>>>> On Thu, Jan 8, 2015 at 9:24 PM, C. <[email protected]> wrote: >>>>>>>> >>>>>>>>> sorry, meant: 5.pdf is the resulting file. >>>>>>>>> >>>>>>>>> Am Donnerstag, 8. Januar 2015 16:53:31 UTC+1 schrieb C.: >>>>>>>>> >>>>>>>>>> tesseract 3.03, example is attached (5.tif is the original, 5.tig >>>>>>>>>> the result). >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Am Donnerstag, 8. Januar 2015 16:02:31 UTC+1 schrieb shree: >>>>>>>>>>> >>>>>>>>>>> I don't think that's the supposed behavior. What version of >>>>>>>>>>> tesseract are you using? Please post a sample image for testing? >>>>>>>>>>> >>>>>>>>>>> ShreeDevi >>>>>>>>>>> ____________________________________________________________ >>>>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>>>>>>> >>>>>>>>>>> On Thu, Jan 8, 2015 at 8:00 PM, C. <[email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> If I do a simple "tesseract 1.tif 2 pdf ", all vertical and >>>>>>>>>>>> horizontal lines (and grahics with small lines) in the source-file >>>>>>>>>>>> dissapear in the resulting PDF-file (Ubuntu server 12.04, >>>>>>>>>>>> tesseract 3.03). >>>>>>>>>>>> >>>>>>>>>>>> Is that the supposed behavior? >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>> Google Groups "tesseract-ocr" group. >>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>>> it, send an email to [email protected]. >>>>>>>>>>>> To post to this group, send email to [email protected] >>>>>>>>>>>> . >>>>>>>>>>>> Visit this group at http://groups.google.com/group >>>>>>>>>>>> /tesseract-ocr. >>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/dcbb0e46-b29 >>>>>>>>>>>> b-447a-a5f4-d634b4371725%40googlegroups.com >>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/dcbb0e46-b29b-447a-a5f4-d634b4371725%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>> . >>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "tesseract-ocr" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to [email protected]. >>>>>>>>> To post to this group, send email to [email protected]. >>>>>>>>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>>>>>>>> To view this discussion on the web visit >>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/6637bf0e-bf2 >>>>>>>>> 3-4ac8-a5bf-8add588ca9be%40googlegroups.com >>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/6637bf0e-bf23-4ac8-a5bf-8add588ca9be%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>> . >>>>>>>>> >>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "tesseract-ocr" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> To post to this group, send email to [email protected]. >>>>>>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/3363264f-ba7 >>>>>>> e-41d7-a866-57a395d09755%40googlegroups.com >>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/3363264f-ba7e-41d7-a866-57a395d09755%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit https://groups.google.com/d/ >>>>> msgid/tesseract-ocr/e39afe04-6bcb-4b04-9697-a9e702440f37% >>>>> 40googlegroups.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/e39afe04-6bcb-4b04-9697-a9e702440f37%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> >>> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/a15c4b73-248f-4eca-acbc-1d9dfb7cc174%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/a15c4b73-248f-4eca-acbc-1d9dfb7cc174%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWcTn3tREwbODaYpJUf5YZpKONthCiNUJVqNm83t_QBPw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

