Re: Problem processing specific TIF from ImageMagick

Quan Nguyen Sun, 11 Sep 2011 06:14:20 -0700

Hi Jon,

I tried your images with VietOCR, which makes the images more amenable
to Tesseract engine, and it produced fairly accurate results. I think
it could have been better if -density 300 had been used.


You can open PDF directly in VietOCR if GhostScript has been
installed.

http://sf.net/projects/vietocr

Regards,
Quan

On Sep 10, 1:21 pm, Jon <[email protected]> wrote:
> I just installed the 3.0.1 version of tesseract (used the Windows
> installer for 3.0 and then added the zipped 3.0.1 to the directory.)
> Only the english training file is present, for now.  I then tested
> tesseract using the phototest.tif file in the doc subdir and it worked
> just fine.  (Admin privileges were set.)
>
> (I'm running on Windows 7 Professional, 64-bit, on a Lenovo T510
> laptop.)
>
> I also installed ImageMagick 6.7.2-Q16 using their installer.  I then
> converted a PDF article into eight .tif page files using it.  All that
> worked okay and the images look correct to me.  To do that, I used the
> following command:
>
>    convert -density 150 -depth 8 -colorspace gray -verbose pic32.PDF p
> %02d.tif
>
> This produced the p00.tif to p07.tif files without exhibiting an error
> and, as I said, they appeared to display fine using Windows Live Photo
> Gallery, for example.
>
> However, tesseract 3.0.1 crashes (Windows wants to look up possible
> solutions before killing the program) on any or all of these .tif
> files that were produced.  I have placed the first two files at my web
> site at:
>
>  http://www.infinitefactors.org/misc/images/tesseract/p00.tif
>  http://www.infinitefactors.org/misc/images/tesseract/p01.tif
>
> (These files are each about 4 megabyte in size.  The directory listing
> is disabled and only the two listed above are world readable, in a
> modest attempt to protect the copyright holder and focus on this
> problem I'm having.)
>
> I'm not sure if I need to change the ImageMagick conversion settings,
> as all of this is pretty new to me.  (First time out.)  It's possible
> that if I convert the PDF using different settings more to the liking
> of tesseract that I'd have better results.  I will attempt a few
> changes on my own, mostly at random because of my profound ignorance,
> but I'm looking for helpful thoughts in the meantime.
>
> It's my hope to eventually learn how to convert PDF files that are
> huge scans of old documents I have from the large PDF file format into
> more compressed versions where the text is converted well and the PDF
> is much shorter and searchable, as well.  But that's long term.  For
> now, I'd just like to figure out how to make these tif pages work.
>
> Thanks in advance.  And I apologize for my ignorance.
>
> Jon

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Problem processing specific TIF from ImageMagick

Reply via email to