hi,
sorry missed the point.
just reproduced it:
$ tesseract testing\eurotext.tif testing\eurotext -l eng+deu pdf
Tesseract Open Source OCR Engine v3.05.00dev with Leptonica
Page 1
Error in fopenWriteStream: stream not opened
Error in pixWrite: stream not opened
Error in fopenReadStream: file not found
Error in extractG4DataFromFile: stream not opened to file
Error in l_generateG4Data: datacomp not extracted
Error in pixGenerateCIData: g4 data not made
Error in l_generateCIDataForPdf: file testing\eurotext.tif format is 4;
unreadab
le
Error during processing.
the pdf comes out but you can't open it.
adobe reader shows anerror that it is corrupted.
i did another test without pdf.
$ tesseract testing\eurotext.tif testing\eurotext -l eng+deu
Tesseract Open Source OCR Engine v3.05.00dev with Leptonica
Page 1
Warning in pixReadMemTiff: tiff page 1 not found
It creates a text which seem to contain everything but shows the warning
message.
i recompiled a new version on my fake website so people can play with
the training tools as well.
so and now i am off for 2 weeks.
have a nice time while i am not around.
greetings,
simon
Am 24.07.2015 um 08:50 schrieb zdenko podobny:
it is not about input, but output.
pdf output is key feature of leptonica 1.71 release (and tesseract
3.03/3.04) and I guess it was not tested on cygwin yet.
Zdenko
On Fri, Jul 24, 2015 at 8:42 AM, Simon Eigeldinger <[email protected]
wrote:
Hi,
i never tried to give tesseract a pdf as an input.
cygwin has leptonica 1.71 or 1.72 by default so i used this for compiling.
maybe leptonica doesn't like pdf files so it might complain.
so ShreeDevi Kumar might convert the pdf into an image or he uses a normal
image (tif, jpg, etc.).
greetings,
simon
Am 24.07.2015 um 08:17 schrieb zdenko podobny:
On Fri, Jul 24, 2015 at 7:10 AM, ShreeDevi Kumar <[email protected]>
wrote:
C:\Users\User\Downloads\TESS>tesseract test/eurotext.tif
test/eurotext-eng-pdf -l eng pdf
Tesseract Open Source OCR Engine v3.04.00 with Leptonica
Page 1
Error in fopenWriteStream: stream not opened
Error in pixWrite: stream not opened
Error in fopenReadStream: file not found
Error in extractG4DataFromFile: stream not opened to file
Error in l_generateG4Data: datacomp not extracted
Error in pixGenerateCIData: g4 data not made
Error in l_generateCIDataForPdf: file test/eurotext.tif format is 4;
unreadable
Error during processing.
It looks like leptonica issue. Did you try to build and run leptonica
progs (all that has pdf in name)?
Zdenko
--
Simon Eigeldinger
Follow me on Twitter: http://www.twitter.com/domasofan/
E-Mail: [email protected]
MSN: [email protected]
ICQ: 121823966
Jabber: [email protected]
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/55B1DE3B.1090906%40vol.at.
For more options, visit https://groups.google.com/d/optout.
--
Simon Eigeldinger
Follow me on Twitter: http://www.twitter.com/domasofan/
E-Mail: [email protected]
MSN: [email protected]
ICQ: 121823966
Jabber: [email protected]
---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/55B1EDDF.3040809%40vol.at.
For more options, visit https://groups.google.com/d/optout.