The results are exactly the same whether I use the 3.01 or 3.00 trained data, so it appears to be a Tesseract 3.01 (or compile, or OS) problem, as opposed to a training data problem. Here is a sample image:
It's 1-bit and very high quality. All noise has also been removed from the page. I actually input into Tesseract as a TIF, since Tesseract crashes when I try to give it a PNG (probably a problem with Leptonica/ libpng). Alasdair On Dec 16, 12:06 am, Patrick Questembert <[email protected]> wrote: > One possible difference is that we don't use the Cube data at all. Can you > share the image? > > If you have an iPhone or Android phone you are welcome to try our use of > Tesseract 3.01 on your own by simply snapping a photo and trying > ScanBizCards on it (it's free, no worries). If you want to see the > Tesseract 3.01 results without our additional corrections [iPhone only], do > this: > - touch Settings on the app's main page > - enter this string in the Configuration field: showalltext=2 > - now when you scan, the actions menu for that photo will include "Show All > Text" as the first option and will show you the text + copy it to the > clipboard > > Note that my own experience is in the context of feeding Tesseract a black > & white image we prepare. In theory it's possible that with Tesseract > default image processing (which is rather weak) 3.01 performs less well. > > Patrick > > > > > > > > On Thu, Dec 15, 2011 at 4:01 AM, Alasdair <[email protected]> wrote: > > I just tested again. Reinstalled 3.01, tested on a couple of images, > > Reinstalled 3.00, tested on a couple of images. > > > Tesseract 3.00 wins again. I'm certain that I am using the correct > > training data, and I have the cube data as well for 3.01. I cleared > > out all the files between installations, so 3.01 only has access to > > 3.01 data and 3.00 only has access to 3.00 data. > > > 3.01: > > > LTARMER MEANWELL was at one time a very rich > > man. He owned large ï¬elds, and had ï¬ne ï¬ocks of > > sheep, and plenty of money. > > > 3.00: > > > LTARMER MEANWELL was at one time a very rich > > man. He owned large fields, and had fine flocks of > > sheep, and plenty of money. > > > Note also that I'm on CentOS 6. Leptonica version 1.68. > > > I also set the environment variable: > > export TESSDATA_PREFIX=/usr/local/share/tessdata > > > It's the same on all images I have tested. > > > Any ideas? > > > On Dec 14, 11:03 am, patrickq <[email protected]> wrote: > > > I have had the opposite experience: Tess 3.01 beats 3.00 often - the > > > reverse does happen but rarely. > > > > Note that Tess 3.01 will do WORSE if using Tess 3.00 trained data - is > > > it possible you are not using the Tess 3.01 trained data? > > > > On Dec 13, 9:22 am, Alasdair <[email protected]> wrote: > > > > > For some reason Tesseract 3.01 is giving much poorer accuracy than > > > > 3.00 on exactly the same images. This is the same whether using the > > > > 3.01 English trained data or the 3.00 English trained data. (I would > > > > expect it to at least be the same using the 3.00 trained data.) > > > > > Am I the only person experiencing this? > > > > > If so, what have I done wrong? > > > > > I'm executing it like this: > > > > tesseract "test.tif" "testout" -l eng > > > -- > > You received this message because you are subscribed to the Google > > Groups "tesseract-ocr" group. > > To post to this group, send email to [email protected] > > To unsubscribe from this group, send email to > > [email protected] > > For more options, visit this group at > >http://groups.google.com/group/tesseract-ocr?hl=en -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

