Thanks everyone! I tried it again, got a slightly different section from the original PDF and saved it as a PNG with 200 DPI. Then I ran convert with the following options:
convert -density 200 -units PixelsPerInch -type Grayscale +compress test2.png test_input2.tif I had to put in the -density 200 because without it the output went to 59 DPI even though the original PNG was at 200. Yes, there are some minor errors but I'm quite happy with the output. Again, thanks for everybody's help! I'll be writing a blog post up about getting all this up and running on Mac OS X 10.6.6. Bob On Feb 18, 10:22 am, "Sriranga(78yrsold)" <[email protected]> wrote: > I checked in FreeOCR(which has tess 3.01 alpha) and found to be in order > with few minor mistakes. > With help of Irfanview - increased to 300dpi from 72dpi and saved as tif > file(uncompressed) and tested. > What zdenko says is correct. > -sriranga(78yrs) > > On Fri, Feb 18, 2011 at 9:27 PM, zdenko podobny <[email protected]> wrote: > > Hi, > > > Just a quick reply: > > I tried it on Windows XP with tesseract 3.00 and it produced bad result > > (nothing usefull). > > > InfranView informations dialog showed that image has resolution 72x72 DPI > > -> to low... > > So I resampled it (with Lanczos algorithm) from 100% to 300% size, set DPI > > to 300 and decreased number of color to 16 (in InfranView because I have no > > time to play with ImageMagick's options ;-) )... > > Than OCR result was much more better with several mistakes (just quick > > check)... > > > So with several image improvements you can get good OCR result. > > > BR, > > > Zd. > > > On Fri, Feb 18, 2011 at 3:53 PM, Bob Kuo <[email protected]> wrote: > > >> Hello all > > >> Please forgive the newbie question. I've seen this posted several > >> times before, and I thought I had the right solution but apparently > >> not. Attached is a PNG that I'd like to run through tesseract. I > >> used ImageMagick's convert to change it into a tiff: > > >> convert -density 200 -units PixelsPerInch test_page.png -type > >> Grayscale +compress test_input.tif > > >> (I've also tried to do this at -density 300 with the same results) > > >> The resulting TIF is attached. When I run it through tesseract I get > >> an output file that is one byte and is basically blank. Command and > >> output below. > > >> tesseract test_input.tif output -l eng > >> Tesseract Open Source OCR Engine > >> Image has 8 * 1 bits per pixel, and size (375,350) > >> Resolution=200 > > >> I saw some other threads about a similar problem, but the solutions > >> were to scale it to 200 or 300 DPI, make sure it was in grayscale, > >> remove the alpha layer, and somewhere else it said it was fixed in > >> Tesseract 2.04. I'm using Tesseract 2.04 on Mac OS X 10.6.6 and > >> ImageMagick 6.6.7-1. Is my image just unsuitable for OCR-ing? > > >> I appreciate any help. > > >> Thanks, > > >> Bob > > >> -- > >> You received this message because you are subscribed to the Google Groups > >> "tesseract-ocr" group. > >> To post to this group, send email to [email protected]. > >> To unsubscribe from this group, send email to > >> [email protected]. > >> For more options, visit this group at > >>http://groups.google.com/group/tesseract-ocr?hl=en. > > > -- > > You received this message because you are subscribed to the Google Groups > > "tesseract-ocr" group. > > To post to this group, send email to [email protected]. > > To unsubscribe from this group, send email to > > [email protected]. > > For more options, visit this group at > >http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

