[tesseract-ocr] Tesseract from git and pdf output

2014-10-02 Thread simon.eigeldinger
hi all, i compiled tesseract from git yesterday and played with it a little bit. pretty impressive what happened since around 2 years. not only has tesseract a lower filesize but it seems its also faster and more accurate. But to the topic of this message: I used the following command to

Re: [tesseract-ocr] Tesseract from git and pdf output

2014-10-02 Thread zdenko podobny
post somewhere your input and output files Zdenko On Thu, Oct 2, 2014 at 2:03 PM, simon.eigeldin...@vol.at wrote: hi all, i compiled tesseract from git yesterday and played with it a little bit. pretty impressive what happened since around 2 years. not only has tesseract a lower filesize

Re: [tesseract-ocr] Tesseract from git and pdf output

2014-10-02 Thread simon.eigeldinger
hello, the files are over there: https://www.dropbox.com/s/9u3nkk1hahyu9o7/image.zip?dl=0 and the output of the console is: $ tesseract image.tif image -l eng pdf Tesseract Open Source OCR Engine v3.04.00 with Leptonica Page 1 Warning in pixReadMemTiff: tiff page 1 not found greetings,

Re: [tesseract-ocr] Tesseract from git and pdf output

2014-10-02 Thread Shree Devi Kumar
Usually that error comes if pdf.ttf and pdf.ttx are not in your tessdata directory. Please check that files from https://code.google.com/p/tesseract-ocr/source/browse/#git%2Ftessdata are there in your tessdata directory pointed by the tessdata_prefix. Shree Devi Kumar

Re: [tesseract-ocr] Tesseract from git and pdf output

2014-10-02 Thread simon.eigeldinger
hi, pdf.ttf and pdf.ttx are in the tessdata directory. as are all the other language files which can be accessed fine. greetings, simon On Thu, 2 Oct 2014 19:24:56 +0530 Shree Devi Kumar shreesh...@gmail.com wrote: Usually that error comes if pdf.ttf and pdf.ttx are not in your tessdata