On Friday, May 27, 2016 at 8:29:02 AM UTC-4, Mika Koistinen wrote:
>
> Looks like i have related problem when trying to create HOCR files for a
> single word images. The result for single word is disappearing, however I
> can find it from txt files without HOCR parameter.
>
...
> ERROR message:
Looks like i have related problem when trying to create HOCR files for a
single word images. The result for single word is disappearing, however I
can find it from txt files without HOCR parameter.
I am using
tesseract 3.05.00dev
leptonica-1.73
libjpeg 8d (libjpeg-turbo 1.3.0) : libpng 1.2.
Looks like the reboot is resetting some variables - TESSDATA_PREFIX
environment variable
You can try giving the path in commandline. See the following batchfile as
a sample ..
-
#Page Segmentation Modes
#3 = Fully automatic page segmentation, but no OSD. (Default)
#4 = Assume a single col
After rebooting the server tesseract complains as follows:
Error opening data file /usr/local/tesseract-ocr/tessdata/deu.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the
parent directory of your "tessdata" directory.
Failed loading language 'deu'
Tesseract coul
I did not succeed in completely reinstalling so I reinstalled the server
again and installed just the latest version of tesseract from the source.
Now everything worked fine again "tesseracting": all lines are shown in the
resulting pdf-file. So it has to be a bug in tesseract 3.03.
Hope that
please see https://code.google.com/p/tesseract-ocr/issues/detail?id=1278
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Fri, Jan 9, 2015 at 5:44 PM, ShreeDevi Kumar
wrote:
> you should *uninstall the old version full
you should *uninstall the old version fully* and then build the version
from git. It is possibly referring to some older libraries.
Also, this needs leptonica 1.71. Not sure if the documentation mentions it
or not.
ShreeDevi
भजन - कीर्त
I tried to compile the version you mentioned (after having installed the
dependencies of the readme), but make stops with the following error:
./.libs/libtesseract.so: undefined reference to `l_generateCIDataForPdf'
./.libs/libtesseract.so: undefined reference to `l_CIDataDestroy'
collect2: error
As far as I know, pdf creation is a new addition and the issues were ironed
out only recently. There have been over 100 commits to the code since 3.03
rc.
If you want the new functionality, you can try compiling the code from
https://code.google.com/p/tesseract-ocr/source/checkout
Instructions ar
First of all: thanks for your help.
Concerning my problem I did a complete reinstall of the Ubuntu
14.04-Server, installed tesseract 3.03 from the repos again and the failure
still exists ! As 3.03 does not seem to be that old, I did not and - to be
honest - do not want to install a newer versi
I am using the git version -- output and messages attached. pdf seems to
have all the lines.
User@HP ~/tesseract-ocr/testing
$ tesseract 5.tif 5 pdf
Tesseract Open Source OCR Engine v3.04.00 with Leptonica
Page 1
OSD: Weak margin (5.78), horiz textlines, not CJK: Don't rotate.
Page 2
Too few chara
sorry, meant: 5.pdf is the resulting file.
Am Donnerstag, 8. Januar 2015 16:53:31 UTC+1 schrieb C.:
>
> tesseract 3.03, example is attached (5.tif is the original, 5.tig the
> result).
>
>
> Am Donnerstag, 8. Januar 2015 16:02:31 UTC+1 schrieb shree:
>>
>> I don't think that's the supposed behavi
tesseract 3.03, example is attached (5.tif is the original, 5.tig the
result).
Am Donnerstag, 8. Januar 2015 16:02:31 UTC+1 schrieb shree:
>
> I don't think that's the supposed behavior. What version of tesseract are
> you using? Please post a sample image for testing?
>
> ShreeDevi
> _
I don't think that's the supposed behavior. What version of tesseract are
you using? Please post a sample image for testing?
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Thu, Jan 8, 2015 at 8:00 PM, C. wrote:
> If
If I do a simple "tesseract 1.tif 2 pdf ", all vertical and horizontal
lines (and grahics with small lines) in the source-file dissapear in the
resulting PDF-file (Ubuntu server 12.04, tesseract 3.03).
Is that the supposed behavior?
--
You received this message because you are subscribed to th
15 matches
Mail list logo