Dne 10.9.2011 20:21, piše Jon:
I just installed the 3.0.1 version of tesseract (used the Windows
installer for 3.0 and then added the zipped 3.0.1 to the directory.)
Only the english training file is present, for now. I then tested
tesseract using the phototest.tif file in the doc subdir and it worked
just fine. (Admin privileges were set.)
(I'm running on Windows 7 Professional, 64-bit, on a Lenovo T510
laptop.)
I also installed ImageMagick 6.7.2-Q16 using their installer. I then
converted a PDF article into eight .tif page files using it. All that
worked okay and the images look correct to me. To do that, I used the
following command:
convert -density 150 -depth 8 -colorspace gray -verbose pic32.PDF p
%02d.tif
This produced the p00.tif to p07.tif files without exhibiting an error
and, as I said, they appeared to display fine using Windows Live Photo
Gallery, for example.
However, tesseract 3.0.1 crashes (Windows wants to look up possible
solutions before killing the program) on any or all of these .tif
files that were produced. I have placed the first two files at my web
site at:
http://www.infinitefactors.org/misc/images/tesseract/p00.tif
http://www.infinitefactors.org/misc/images/tesseract/p01.tif
(These files are each about 4 megabyte in size. The directory listing
is disabled and only the two listed above are world readable, in a
modest attempt to protect the copyright holder and focus on this
problem I'm having.)
I'm not sure if I need to change the ImageMagick conversion settings,
as all of this is pretty new to me. (First time out.) It's possible
that if I convert the PDF using different settings more to the liking
of tesseract that I'd have better results. I will attempt a few
changes on my own, mostly at random because of my profound ignorance,
but I'm looking for helpful thoughts in the meantime.
It's my hope to eventually learn how to convert PDF files that are
huge scans of old documents I have from the large PDF file format into
more compressed versions where the text is converted well and the PDF
is much shorter and searchable, as well. But that's long term. For
now, I'd just like to figure out how to make these tif pages work.
Thanks in advance. And I apologize for my ignorance.
Jon
Hello...
I had similar problem.
The image you posted is not 8 bits per pixel but 16. And this seems to
disturb output.
The best solution (at least for my computer with low ram) is to change
imageMagick Q16 with Q8.
Slavko.
p.s.
Images are bad quality. Somewhere jpeg compression is used and have a
lot of artefacts.
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en