I just tested Tesseract 3.01 on your image (actually on the first two
paragraphs because ScanBizCards reduces larger images so have to feed it
business-card sized images to avoid the reducing) - and 3.01 produced
spotless results - even for a s+e mangled a bit by your image processing!
Pretty awesome actually. So I don't know what's wrong with your setup. Just
to make it the same as ScanBizCards, want to try removing the cube training
data (we don't include it)?

Here are the images I used and the results so you can see for yourself:
scanbizcards dot com /samples/part1.jpg
scanbizcards dot com /samples/part1text.jpg
scanbizcards dot com /samples/part2.jpg
scanbizcards dot com /samples/part2text.jpg

Note that the image processing you used damaged some letters a bit (but
that's a given in your test of both versions). Your image is just the right
size for Tesseract, text is large enough.

Patrick

On Fri, Dec 16, 2011 at 8:46 AM, Alasdair <[email protected]> wrote:

> The board doesn't seem to allow URLs.
>
> The image is here:
> imageshack dot us /photo/my-images/12/21904105.png
>
> Alasdair
>
>
> On Dec 16, 12:06 am, Patrick Questembert
> <[email protected]> wrote:
> > One possible difference is that we don't use the Cube data at all. Can
> you
> > share the image?
> >
> > If you have an iPhone or Android phone you are welcome to try our use of
> > Tesseract 3.01 on your own by simply snapping a photo and trying
> > ScanBizCards on it (it's free, no worries). If you want to see the
> > Tesseract 3.01 results without our additional corrections [iPhone only],
> do
> > this:
> > - touch Settings on the app's main page
> > - enter this string in the Configuration field: showalltext=2
> > - now when you scan, the actions menu for that photo will include "Show
> All
> > Text" as the first option and will show you the text + copy it to the
> > clipboard
> >
> > Note that my own experience is in the context of feeding Tesseract a
> black
> > & white image we prepare. In theory it's possible that with Tesseract
> > default image processing (which is rather weak) 3.01 performs less well.
> >
> > Patrick
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Dec 15, 2011 at 4:01 AM, Alasdair <[email protected]> wrote:
> > > I just tested again. Reinstalled 3.01, tested on a couple of images,
> > > Reinstalled 3.00, tested on a couple of images.
> >
> > > Tesseract 3.00 wins again. I'm certain that I am using the correct
> > > training data, and I have the cube data as well for 3.01. I cleared
> > > out all the files between installations, so 3.01 only has access to
> > > 3.01 data and 3.00 only has access to 3.00 data.
> >
> > > 3.01:
> >
> > > LTARMER MEANWELL was at one time a very rich
> > > man. He owned large ï¬elds, and had ï¬ne ï¬ocks of
> > > sheep, and plenty of money.
> >
> > > 3.00:
> >
> > > LTARMER MEANWELL was at one time a very rich
> > > man. He owned large fields, and had fine flocks of
> > > sheep, and plenty of money.
> >
> > > Note also that I'm on CentOS 6. Leptonica version 1.68.
> >
> > > I also set the environment variable:
> > > export TESSDATA_PREFIX=/usr/local/share/tessdata
> >
> > > It's the same on all images I have tested.
> >
> > > Any ideas?
> >
> > > On Dec 14, 11:03 am, patrickq <[email protected]> wrote:
> > > > I have had the opposite experience: Tess 3.01 beats 3.00 often - the
> > > > reverse does happen but rarely.
> >
> > > > Note that Tess 3.01 will do WORSE if using Tess 3.00 trained data -
> is
> > > > it possible you are not using the Tess 3.01 trained data?
> >
> > > > On Dec 13, 9:22 am, Alasdair <[email protected]> wrote:
> >
> > > > > For some reason Tesseract 3.01 is giving much poorer accuracy than
> > > > > 3.00 on exactly the same images. This is the same whether using the
> > > > > 3.01 English trained data or the 3.00 English trained data. (I
> would
> > > > > expect it to at least be the same using the 3.00 trained data.)
> >
> > > > > Am I the only person experiencing this?
> >
> > > > > If so, what have I done wrong?
> >
> > > > > I'm executing it like this:
> > > > > tesseract "test.tif" "testout" -l eng
> >
> > > --
> > > You received this message because you are subscribed to the Google
> > > Groups "tesseract-ocr" group.
> > > To post to this group, send email to [email protected]
> > > To unsubscribe from this group, send email to
> > > [email protected]
> > > For more options, visit this group at
> > >http://groups.google.com/group/tesseract-ocr?hl=en
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to