I ran test_page.png through VietOCR 3.1 with Screenshot Mode enabled
and got acceptable results back. Since it's a Java program, it
certainly can run on OS X, provided that you build the Tess engine.
And if Ghostscript is installed, VietOCR can read PDF too.

On Feb 18, 10:54 am, Bob Kuo <[email protected]> wrote:
> Thanks everyone!  I tried it again, got a slightly different section
> from the original PDF and saved it as a PNG with 200 DPI.  Then I ran
> convert with the following options:
>
> convert -density 200 -units PixelsPerInch -type Grayscale +compress
> test2.png test_input2.tif
>
> I had to put in the -density 200 because without it the output went to
> 59 DPI even though the original PNG was at 200.
>
> Yes, there are some minor errors but I'm quite happy with the output.
>
> Again, thanks for everybody's help!  I'll be writing a blog post up
> about getting all this up and running on Mac OS X 10.6.6.
>
> Bob
>
> On Feb 18, 10:22 am, "Sriranga(78yrsold)" <[email protected]>
> wrote:
>
> > I checked in FreeOCR(which has tess 3.01 alpha) and found to be in  order
> > with few  minor mistakes.
> > With help of Irfanview - increased to 300dpi from 72dpi and saved as tif
> > file(uncompressed) and tested.
> > What zdenko says  is correct.
> > -sriranga(78yrs)
>
> > On Fri, Feb 18, 2011 at 9:27 PM, zdenko podobny <[email protected]> wrote:
> > > Hi,
>
> > > Just a quick reply:
> > > I tried it on Windows XP with tesseract 3.00 and it produced bad result
> > > (nothing usefull).
>
> > > InfranView informations dialog showed that image has resolution 72x72 DPI
> > > -> to low...
> > > So I resampled  it (with Lanczos algorithm) from 100% to 300% size, set 
> > > DPI
> > > to 300 and decreased number of color to 16 (in InfranView because I have 
> > > no
> > > time to play with ImageMagick's options ;-) )...
> > > Than OCR result was much more better with several mistakes (just quick
> > > check)...
>
> > > So with  several image improvements  you can get good OCR result.
>
> > > BR,
>
> > > Zd.
>
> > > On Fri, Feb 18, 2011 at 3:53 PM, Bob Kuo <[email protected]> wrote:
>
> > >> Hello all
>
> > >> Please forgive the newbie question. I've seen this posted several
> > >> times before, and I thought I had the right solution but apparently
> > >> not.  Attached is a PNG that I'd like to run through tesseract.  I
> > >> used ImageMagick's convert to change it into a tiff:
>
> > >> convert -density 200 -units PixelsPerInch test_page.png -type
> > >> Grayscale +compress test_input.tif
>
> > >> (I've also tried to do this at -density 300 with the same results)
>
> > >> The resulting TIF is attached.  When I run it through tesseract I get
> > >> an output file that is one byte and is basically blank.  Command and
> > >> output below.
>
> > >> tesseract test_input.tif output -l eng
> > >> Tesseract Open Source OCR Engine
> > >> Image has 8 * 1 bits per pixel, and size (375,350)
> > >> Resolution=200
>
> > >> I saw some other threads about a similar problem, but the solutions
> > >> were to scale it to 200 or 300 DPI, make sure it was in grayscale,
> > >> remove the alpha layer, and somewhere else it said it was fixed in
> > >> Tesseract 2.04.  I'm using Tesseract 2.04 on Mac OS X 10.6.6 and
> > >> ImageMagick 6.6.7-1.  Is my image just unsuitable for OCR-ing?
>
> > >> I appreciate any help.
>
> > >> Thanks,
>
> > >> Bob
>
> > >> --
> > >> You received this message because you are subscribed to the Google Groups
> > >> "tesseract-ocr" group.
> > >> To post to this group, send email to [email protected].
> > >> To unsubscribe from this group, send email to
> > >> [email protected].
> > >> For more options, visit this group at
> > >>http://groups.google.com/group/tesseract-ocr?hl=en.
>
> > >  --
> > > You received this message because you are subscribed to the Google Groups
> > > "tesseract-ocr" group.
> > > To post to this group, send email to [email protected].
> > > To unsubscribe from this group, send email to
> > > [email protected].
> > > For more options, visit this group at
> > >http://groups.google.com/group/tesseract-ocr?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to