After rebooting the server tesseract complains as follows: Error opening data file /usr/local/tesseract-ocr/tessdata/deu.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory. Failed loading language 'deu' Tesseract couldn't load any languages! Could not initialize tesseract.
I manually copied deu.traineddata to that folde and chmod'ed it to 777, but that just works until next reboots. I think I'll give up soon with Tesseract and stay with OCR in Acrobat pro... Am Freitag, 9. Januar 2015 18:34:25 UTC+1 schrieb C.: > > I did not succeed in completely reinstalling so I reinstalled the server > again and installed just the latest version of tesseract from the source. > > Now everything worked fine again "tesseracting": all lines are shown in > the resulting pdf-file. So it has to be a bug in tesseract 3.03. > > Hope that the latest version goes to to ubuntu-repos soon (cause I had > some problems after compiling with the TESSDATA_PREFIX thing). > > Am Freitag, 9. Januar 2015 13:16:03 UTC+1 schrieb shree: >> >> please see https://code.google.com/p/tesseract-ocr/issues/detail?id=1278 >> >> ShreeDevi >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> On Fri, Jan 9, 2015 at 5:44 PM, ShreeDevi Kumar <[email protected]> >> wrote: >> >>> you should *uninstall the old version fully* and then build the version >>> from git. It is possibly referring to some older libraries. >>> >>> Also, this needs leptonica 1.71. Not sure if the documentation mentions >>> it or not. >>> >>> ShreeDevi >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >>> On Fri, Jan 9, 2015 at 5:37 PM, C. <[email protected]> wrote: >>> >>>> I tried to compile the version you mentioned (after having installed >>>> the dependencies of the readme), but make stops with the following error: >>>> >>>> ./.libs/libtesseract.so: undefined reference to `l_generateCIDataForPdf' >>>> ./.libs/libtesseract.so: undefined reference to `l_CIDataDestroy' >>>> collect2: error: ld returned 1 exit status >>>> make[2]: *** [tesseract] Fehler 1 >>>> >>>> >>>> Am Freitag, 9. Januar 2015 09:28:53 UTC+1 schrieb shree: >>>>> >>>>> As far as I know, pdf creation is a new addition and the issues were >>>>> ironed out only recently. There have been over 100 commits to the code >>>>> since 3.03 rc. >>>>> >>>>> If you want the new functionality, you can try compiling the code from >>>>> https://code.google.com/p/tesseract-ocr/source/checkout >>>>> >>>>> Instructions are at https://code.google.com/p/ >>>>> tesseract-ocr/wiki/Compiling >>>>> >>>>> ShreeDevi >>>>> ____________________________________________________________ >>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>> >>>>> On Fri, Jan 9, 2015 at 1:53 PM, C. <[email protected]> wrote: >>>>> >>>>>> First of all: thanks for your help. >>>>>> >>>>>> Concerning my problem I did a complete reinstall of the Ubuntu >>>>>> 14.04-Server, installed tesseract 3.03 from the repos again and the >>>>>> failure >>>>>> still exists ! As 3.03 does not seem to be that old, I did not and - to >>>>>> be >>>>>> honest - do not want to install a newer version from github. >>>>>> >>>>>> Is this a know bug? >>>>>> >>>>>> Am Freitag, 9. Januar 2015 06:33:01 UTC+1 schrieb shree: >>>>>>> >>>>>>> I am using the git version -- output and messages attached. pdf >>>>>>> seems to have all the lines. >>>>>>> >>>>>>> User@HP ~/tesseract-ocr/testing >>>>>>> $ tesseract 5.tif 5 pdf >>>>>>> Tesseract Open Source OCR Engine v3.04.00 with Leptonica >>>>>>> Page 1 >>>>>>> OSD: Weak margin (5.78), horiz textlines, not CJK: Don't rotate. >>>>>>> Page 2 >>>>>>> Too few characters. Skipping this page >>>>>>> OSD: Weak margin (0.00) for 0 blob text block, but using orientation >>>>>>> anyway: 0 >>>>>>> Empty page!! >>>>>>> Too few characters. Skipping this page >>>>>>> OSD: Weak margin (0.00) for 0 blob text block, but using orientation >>>>>>> anyway: 0 >>>>>>> Empty page!! >>>>>>> Warning in pixReadMemTiff: tiff page 2 not found >>>>>>> >>>>>>> User@HP ~/tesseract-ocr/testing >>>>>>> $ tesseract -v >>>>>>> tesseract 3.04.00 >>>>>>> leptonica-1.71 >>>>>>> libgif 5.1.0 : libjpeg 8d : libpng 1.6.14 : libtiff 4.0.3 : zlib >>>>>>> 1.2.8 : libwebp 0.4.2 >>>>>>> >>>>>>> >>>>>>> ShreeDevi >>>>>>> ____________________________________________________________ >>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>>> >>>>>>> On Thu, Jan 8, 2015 at 9:24 PM, C. <[email protected]> wrote: >>>>>>> >>>>>>>> sorry, meant: 5.pdf is the resulting file. >>>>>>>> >>>>>>>> Am Donnerstag, 8. Januar 2015 16:53:31 UTC+1 schrieb C.: >>>>>>>> >>>>>>>>> tesseract 3.03, example is attached (5.tif is the original, 5.tig >>>>>>>>> the result). >>>>>>>>> >>>>>>>>> >>>>>>>>> Am Donnerstag, 8. Januar 2015 16:02:31 UTC+1 schrieb shree: >>>>>>>>>> >>>>>>>>>> I don't think that's the supposed behavior. What version of >>>>>>>>>> tesseract are you using? Please post a sample image for testing? >>>>>>>>>> >>>>>>>>>> ShreeDevi >>>>>>>>>> ____________________________________________________________ >>>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>>>>>> >>>>>>>>>> On Thu, Jan 8, 2015 at 8:00 PM, C. <[email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> If I do a simple "tesseract 1.tif 2 pdf ", all vertical and >>>>>>>>>>> horizontal lines (and grahics with small lines) in the source-file >>>>>>>>>>> dissapear in the resulting PDF-file (Ubuntu server 12.04, tesseract >>>>>>>>>>> 3.03). >>>>>>>>>>> >>>>>>>>>>> Is that the supposed behavior? >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>> Google Groups "tesseract-ocr" group. >>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>> it, send an email to [email protected]. >>>>>>>>>>> To post to this group, send email to [email protected]. >>>>>>>>>>> Visit this group at http://groups.google.com/group/tesseract-ocr >>>>>>>>>>> . >>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/dcbb0e46-b29 >>>>>>>>>>> b-447a-a5f4-d634b4371725%40googlegroups.com >>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/dcbb0e46-b29b-447a-a5f4-d634b4371725%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>> . >>>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "tesseract-ocr" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to [email protected]. >>>>>>>> To post to this group, send email to [email protected]. >>>>>>>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>>>>>>> To view this discussion on the web visit >>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/6637bf0e-bf2 >>>>>>>> 3-4ac8-a5bf-8add588ca9be%40googlegroups.com >>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/6637bf0e-bf23-4ac8-a5bf-8add588ca9be%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>> . >>>>>>>> >>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>> >>>>>>> >>>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "tesseract-ocr" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To post to this group, send email to [email protected]. >>>>>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>>>>> To view this discussion on the web visit https://groups.google.com/d/ >>>>>> msgid/tesseract-ocr/3363264f-ba7e-41d7-a866-57a395d09755% >>>>>> 40googlegroups.com >>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/3363264f-ba7e-41d7-a866-57a395d09755%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/e39afe04-6bcb-4b04-9697-a9e702440f37%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/e39afe04-6bcb-4b04-9697-a9e702440f37%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a15c4b73-248f-4eca-acbc-1d9dfb7cc174%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

