I did not succeed in completely reinstalling so I reinstalled the server again and installed just the latest version of tesseract from the source.
Now everything worked fine again "tesseracting": all lines are shown in the resulting pdf-file. So it has to be a bug in tesseract 3.03. Hope that the latest version goes to to ubuntu-repos soon (cause I had some problems after compiling with the TESSDATA_PREFIX thing). Am Freitag, 9. Januar 2015 13:16:03 UTC+1 schrieb shree: > > please see https://code.google.com/p/tesseract-ocr/issues/detail?id=1278 > > ShreeDevi > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Fri, Jan 9, 2015 at 5:44 PM, ShreeDevi Kumar <[email protected] > <javascript:>> wrote: > >> you should *uninstall the old version fully* and then build the version >> from git. It is possibly referring to some older libraries. >> >> Also, this needs leptonica 1.71. Not sure if the documentation mentions >> it or not. >> >> ShreeDevi >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> On Fri, Jan 9, 2015 at 5:37 PM, C. <[email protected] <javascript:>> >> wrote: >> >>> I tried to compile the version you mentioned (after having installed the >>> dependencies of the readme), but make stops with the following error: >>> >>> ./.libs/libtesseract.so: undefined reference to `l_generateCIDataForPdf' >>> ./.libs/libtesseract.so: undefined reference to `l_CIDataDestroy' >>> collect2: error: ld returned 1 exit status >>> make[2]: *** [tesseract] Fehler 1 >>> >>> >>> Am Freitag, 9. Januar 2015 09:28:53 UTC+1 schrieb shree: >>>> >>>> As far as I know, pdf creation is a new addition and the issues were >>>> ironed out only recently. There have been over 100 commits to the code >>>> since 3.03 rc. >>>> >>>> If you want the new functionality, you can try compiling the code from >>>> https://code.google.com/p/tesseract-ocr/source/checkout >>>> >>>> Instructions are at https://code.google.com/p/ >>>> tesseract-ocr/wiki/Compiling >>>> >>>> ShreeDevi >>>> ____________________________________________________________ >>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>> >>>> On Fri, Jan 9, 2015 at 1:53 PM, C. <[email protected]> wrote: >>>> >>>>> First of all: thanks for your help. >>>>> >>>>> Concerning my problem I did a complete reinstall of the Ubuntu >>>>> 14.04-Server, installed tesseract 3.03 from the repos again and the >>>>> failure >>>>> still exists ! As 3.03 does not seem to be that old, I did not and - to >>>>> be >>>>> honest - do not want to install a newer version from github. >>>>> >>>>> Is this a know bug? >>>>> >>>>> Am Freitag, 9. Januar 2015 06:33:01 UTC+1 schrieb shree: >>>>>> >>>>>> I am using the git version -- output and messages attached. pdf seems >>>>>> to have all the lines. >>>>>> >>>>>> User@HP ~/tesseract-ocr/testing >>>>>> $ tesseract 5.tif 5 pdf >>>>>> Tesseract Open Source OCR Engine v3.04.00 with Leptonica >>>>>> Page 1 >>>>>> OSD: Weak margin (5.78), horiz textlines, not CJK: Don't rotate. >>>>>> Page 2 >>>>>> Too few characters. Skipping this page >>>>>> OSD: Weak margin (0.00) for 0 blob text block, but using orientation >>>>>> anyway: 0 >>>>>> Empty page!! >>>>>> Too few characters. Skipping this page >>>>>> OSD: Weak margin (0.00) for 0 blob text block, but using orientation >>>>>> anyway: 0 >>>>>> Empty page!! >>>>>> Warning in pixReadMemTiff: tiff page 2 not found >>>>>> >>>>>> User@HP ~/tesseract-ocr/testing >>>>>> $ tesseract -v >>>>>> tesseract 3.04.00 >>>>>> leptonica-1.71 >>>>>> libgif 5.1.0 : libjpeg 8d : libpng 1.6.14 : libtiff 4.0.3 : zlib >>>>>> 1.2.8 : libwebp 0.4.2 >>>>>> >>>>>> >>>>>> ShreeDevi >>>>>> ____________________________________________________________ >>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>> >>>>>> On Thu, Jan 8, 2015 at 9:24 PM, C. <[email protected]> wrote: >>>>>> >>>>>>> sorry, meant: 5.pdf is the resulting file. >>>>>>> >>>>>>> Am Donnerstag, 8. Januar 2015 16:53:31 UTC+1 schrieb C.: >>>>>>> >>>>>>>> tesseract 3.03, example is attached (5.tif is the original, 5.tig >>>>>>>> the result). >>>>>>>> >>>>>>>> >>>>>>>> Am Donnerstag, 8. Januar 2015 16:02:31 UTC+1 schrieb shree: >>>>>>>>> >>>>>>>>> I don't think that's the supposed behavior. What version of >>>>>>>>> tesseract are you using? Please post a sample image for testing? >>>>>>>>> >>>>>>>>> ShreeDevi >>>>>>>>> ____________________________________________________________ >>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>>>>> >>>>>>>>> On Thu, Jan 8, 2015 at 8:00 PM, C. <[email protected]> wrote: >>>>>>>>> >>>>>>>>>> If I do a simple "tesseract 1.tif 2 pdf ", all vertical and >>>>>>>>>> horizontal lines (and grahics with small lines) in the source-file >>>>>>>>>> dissapear in the resulting PDF-file (Ubuntu server 12.04, tesseract >>>>>>>>>> 3.03). >>>>>>>>>> >>>>>>>>>> Is that the supposed behavior? >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>> Google Groups "tesseract-ocr" group. >>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>>> send an email to [email protected]. >>>>>>>>>> To post to this group, send email to [email protected]. >>>>>>>>>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>>>>>>>>> To view this discussion on the web visit >>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/dcbb0e46-b29 >>>>>>>>>> b-447a-a5f4-d634b4371725%40googlegroups.com >>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/dcbb0e46-b29b-447a-a5f4-d634b4371725%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>> . >>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "tesseract-ocr" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> To post to this group, send email to [email protected]. >>>>>>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/6637bf0e-bf2 >>>>>>> 3-4ac8-a5bf-8add588ca9be%40googlegroups.com >>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/6637bf0e-bf23-4ac8-a5bf-8add588ca9be%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit https://groups.google.com/d/ >>>>> msgid/tesseract-ocr/3363264f-ba7e-41d7-a866-57a395d09755% >>>>> 40googlegroups.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/3363264f-ba7e-41d7-a866-57a395d09755%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected] <javascript:>. >>> To post to this group, send email to [email protected] >>> <javascript:>. >>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/e39afe04-6bcb-4b04-9697-a9e702440f37%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/e39afe04-6bcb-4b04-9697-a9e702440f37%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/649f1325-eae3-4692-abb8-fc75446f3c56%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

