I tried to compile the version you mentioned (after having installed the dependencies of the readme), but make stops with the following error:
./.libs/libtesseract.so: undefined reference to `l_generateCIDataForPdf' ./.libs/libtesseract.so: undefined reference to `l_CIDataDestroy' collect2: error: ld returned 1 exit status make[2]: *** [tesseract] Fehler 1 Am Freitag, 9. Januar 2015 09:28:53 UTC+1 schrieb shree: > > As far as I know, pdf creation is a new addition and the issues were > ironed out only recently. There have been over 100 commits to the code > since 3.03 rc. > > If you want the new functionality, you can try compiling the code from > https://code.google.com/p/tesseract-ocr/source/checkout > > Instructions are at https://code.google.com/p/tesseract-ocr/wiki/Compiling > > ShreeDevi > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Fri, Jan 9, 2015 at 1:53 PM, C. <[email protected] <javascript:>> > wrote: > >> First of all: thanks for your help. >> >> Concerning my problem I did a complete reinstall of the Ubuntu >> 14.04-Server, installed tesseract 3.03 from the repos again and the failure >> still exists ! As 3.03 does not seem to be that old, I did not and - to be >> honest - do not want to install a newer version from github. >> >> Is this a know bug? >> >> Am Freitag, 9. Januar 2015 06:33:01 UTC+1 schrieb shree: >>> >>> I am using the git version -- output and messages attached. pdf seems to >>> have all the lines. >>> >>> User@HP ~/tesseract-ocr/testing >>> $ tesseract 5.tif 5 pdf >>> Tesseract Open Source OCR Engine v3.04.00 with Leptonica >>> Page 1 >>> OSD: Weak margin (5.78), horiz textlines, not CJK: Don't rotate. >>> Page 2 >>> Too few characters. Skipping this page >>> OSD: Weak margin (0.00) for 0 blob text block, but using orientation >>> anyway: 0 >>> Empty page!! >>> Too few characters. Skipping this page >>> OSD: Weak margin (0.00) for 0 blob text block, but using orientation >>> anyway: 0 >>> Empty page!! >>> Warning in pixReadMemTiff: tiff page 2 not found >>> >>> User@HP ~/tesseract-ocr/testing >>> $ tesseract -v >>> tesseract 3.04.00 >>> leptonica-1.71 >>> libgif 5.1.0 : libjpeg 8d : libpng 1.6.14 : libtiff 4.0.3 : zlib 1.2.8 >>> : libwebp 0.4.2 >>> >>> >>> ShreeDevi >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >>> On Thu, Jan 8, 2015 at 9:24 PM, C. <[email protected]> wrote: >>> >>>> sorry, meant: 5.pdf is the resulting file. >>>> >>>> Am Donnerstag, 8. Januar 2015 16:53:31 UTC+1 schrieb C.: >>>> >>>>> tesseract 3.03, example is attached (5.tif is the original, 5.tig the >>>>> result). >>>>> >>>>> >>>>> Am Donnerstag, 8. Januar 2015 16:02:31 UTC+1 schrieb shree: >>>>>> >>>>>> I don't think that's the supposed behavior. What version of tesseract >>>>>> are you using? Please post a sample image for testing? >>>>>> >>>>>> ShreeDevi >>>>>> ____________________________________________________________ >>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>> >>>>>> On Thu, Jan 8, 2015 at 8:00 PM, C. <[email protected]> wrote: >>>>>> >>>>>>> If I do a simple "tesseract 1.tif 2 pdf ", all vertical and >>>>>>> horizontal lines (and grahics with small lines) in the source-file >>>>>>> dissapear in the resulting PDF-file (Ubuntu server 12.04, tesseract >>>>>>> 3.03). >>>>>>> >>>>>>> Is that the supposed behavior? >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "tesseract-ocr" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> To post to this group, send email to [email protected]. >>>>>>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/dcbb0e46-b29 >>>>>>> b-447a-a5f4-d634b4371725%40googlegroups.com >>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/dcbb0e46-b29b-447a-a5f4-d634b4371725%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> >>>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit https://groups.google.com/d/ >>>> msgid/tesseract-ocr/6637bf0e-bf23-4ac8-a5bf-8add588ca9be% >>>> 40googlegroups.com >>>> <https://groups.google.com/d/msgid/tesseract-ocr/6637bf0e-bf23-4ac8-a5bf-8add588ca9be%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at http://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/3363264f-ba7e-41d7-a866-57a395d09755%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/3363264f-ba7e-41d7-a866-57a395d09755%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/e39afe04-6bcb-4b04-9697-a9e702440f37%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

