I ran test_page.png through VietOCR 3.1 with Screenshot Mode enabled and got acceptable results back. Since it's a Java program, it certainly can run on OS X, provided that you build the Tess engine. And if Ghostscript is installed, VietOCR can read PDF too.
On Feb 18, 10:54 am, Bob Kuo <[email protected]> wrote: > Thanks everyone! I tried it again, got a slightly different section > from the original PDF and saved it as a PNG with 200 DPI. Then I ran > convert with the following options: > > convert -density 200 -units PixelsPerInch -type Grayscale +compress > test2.png test_input2.tif > > I had to put in the -density 200 because without it the output went to > 59 DPI even though the original PNG was at 200. > > Yes, there are some minor errors but I'm quite happy with the output. > > Again, thanks for everybody's help! I'll be writing a blog post up > about getting all this up and running on Mac OS X 10.6.6. > > Bob > > On Feb 18, 10:22 am, "Sriranga(78yrsold)" <[email protected]> > wrote: > > > I checked in FreeOCR(which has tess 3.01 alpha) and found to be in order > > with few minor mistakes. > > With help of Irfanview - increased to 300dpi from 72dpi and saved as tif > > file(uncompressed) and tested. > > What zdenko says is correct. > > -sriranga(78yrs) > > > On Fri, Feb 18, 2011 at 9:27 PM, zdenko podobny <[email protected]> wrote: > > > Hi, > > > > Just a quick reply: > > > I tried it on Windows XP with tesseract 3.00 and it produced bad result > > > (nothing usefull). > > > > InfranView informations dialog showed that image has resolution 72x72 DPI > > > -> to low... > > > So I resampled it (with Lanczos algorithm) from 100% to 300% size, set > > > DPI > > > to 300 and decreased number of color to 16 (in InfranView because I have > > > no > > > time to play with ImageMagick's options ;-) )... > > > Than OCR result was much more better with several mistakes (just quick > > > check)... > > > > So with several image improvements you can get good OCR result. > > > > BR, > > > > Zd. > > > > On Fri, Feb 18, 2011 at 3:53 PM, Bob Kuo <[email protected]> wrote: > > > >> Hello all > > > >> Please forgive the newbie question. I've seen this posted several > > >> times before, and I thought I had the right solution but apparently > > >> not. Attached is a PNG that I'd like to run through tesseract. I > > >> used ImageMagick's convert to change it into a tiff: > > > >> convert -density 200 -units PixelsPerInch test_page.png -type > > >> Grayscale +compress test_input.tif > > > >> (I've also tried to do this at -density 300 with the same results) > > > >> The resulting TIF is attached. When I run it through tesseract I get > > >> an output file that is one byte and is basically blank. Command and > > >> output below. > > > >> tesseract test_input.tif output -l eng > > >> Tesseract Open Source OCR Engine > > >> Image has 8 * 1 bits per pixel, and size (375,350) > > >> Resolution=200 > > > >> I saw some other threads about a similar problem, but the solutions > > >> were to scale it to 200 or 300 DPI, make sure it was in grayscale, > > >> remove the alpha layer, and somewhere else it said it was fixed in > > >> Tesseract 2.04. I'm using Tesseract 2.04 on Mac OS X 10.6.6 and > > >> ImageMagick 6.6.7-1. Is my image just unsuitable for OCR-ing? > > > >> I appreciate any help. > > > >> Thanks, > > > >> Bob > > > >> -- > > >> You received this message because you are subscribed to the Google Groups > > >> "tesseract-ocr" group. > > >> To post to this group, send email to [email protected]. > > >> To unsubscribe from this group, send email to > > >> [email protected]. > > >> For more options, visit this group at > > >>http://groups.google.com/group/tesseract-ocr?hl=en. > > > > -- > > > You received this message because you are subscribed to the Google Groups > > > "tesseract-ocr" group. > > > To post to this group, send email to [email protected]. > > > To unsubscribe from this group, send email to > > > [email protected]. > > > For more options, visit this group at > > >http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

