Re: Problem with Tesseract 3.01 Accuracy

Alasdair Fri, 16 Dec 2011 08:20:24 -0800

The results are exactly the same whether I use the 3.01 or 3.00
trained data, so it appears to be a Tesseract 3.01 (or compile, or OS)
problem, as opposed to a training data problem.
Here is a sample image:


It's 1-bit and very high quality. All noise has also been removed from
the page. I actually input into Tesseract as a TIF, since Tesseract
crashes when I try to give it a PNG (probably a problem with Leptonica/
libpng).
Alasdair
On Dec 16, 12:06 am, Patrick Questembert
<[email protected]> wrote:
> One possible difference is that we don't use the Cube data at all. Can you
> share the image?
>
> If you have an iPhone or Android phone you are welcome to try our use of
> Tesseract 3.01 on your own by simply snapping a photo and trying
> ScanBizCards on it (it's free, no worries). If you want to see the
> Tesseract 3.01 results without our additional corrections [iPhone only], do
> this:
> - touch Settings on the app's main page
> - enter this string in the Configuration field: showalltext=2
> - now when you scan, the actions menu for that photo will include "Show All
> Text" as the first option and will show you the text + copy it to the
> clipboard
>
> Note that my own experience is in the context of feeding Tesseract a black
> & white image we prepare. In theory it's possible that with Tesseract
> default image processing (which is rather weak) 3.01 performs less well.
>
> Patrick
>
>
>
>
>
>
>
> On Thu, Dec 15, 2011 at 4:01 AM, Alasdair <[email protected]> wrote:
> > I just tested again. Reinstalled 3.01, tested on a couple of images,
> > Reinstalled 3.00, tested on a couple of images.
>
> > Tesseract 3.00 wins again. I'm certain that I am using the correct
> > training data, and I have the cube data as well for 3.01. I cleared
> > out all the files between installations, so 3.01 only has access to
> > 3.01 data and 3.00 only has access to 3.00 data.
>
> > 3.01:
>
> > LTARMER MEANWELL was at one time a very rich
> > man. He owned large ï¬elds, and had ï¬ne ï¬ocks of
> > sheep, and plenty of money.
>
> > 3.00:
>
> > LTARMER MEANWELL was at one time a very rich
> > man. He owned large fields, and had fine flocks of
> > sheep, and plenty of money.
>
> > Note also that I'm on CentOS 6. Leptonica version 1.68.
>
> > I also set the environment variable:
> > export TESSDATA_PREFIX=/usr/local/share/tessdata
>
> > It's the same on all images I have tested.
>
> > Any ideas?
>
> > On Dec 14, 11:03 am, patrickq <[email protected]> wrote:
> > > I have had the opposite experience: Tess 3.01 beats 3.00 often - the
> > > reverse does happen but rarely.
>
> > > Note that Tess 3.01 will do WORSE if using Tess 3.00 trained data - is
> > > it possible you are not using the Tess 3.01 trained data?
>
> > > On Dec 13, 9:22 am, Alasdair <[email protected]> wrote:
>
> > > > For some reason Tesseract 3.01 is giving much poorer accuracy than
> > > > 3.00 on exactly the same images. This is the same whether using the
> > > > 3.01 English trained data or the 3.00 English trained data. (I would
> > > > expect it to at least be the same using the 3.00 trained data.)
>
> > > > Am I the only person experiencing this?
>
> > > > If so, what have I done wrong?
>
> > > > I'm executing it like this:
> > > > tesseract "test.tif" "testout" -l eng
>
> > --
> > You received this message because you are subscribed to the Google
> > Groups "tesseract-ocr" group.
> > To post to this group, send email to [email protected]
> > To unsubscribe from this group, send email to
> > [email protected]
> > For more options, visit this group at
> >http://groups.google.com/group/tesseract-ocr?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Problem with Tesseract 3.01 Accuracy

Reply via email to