I just tested Tesseract 3.01 on your image (actually on the first two paragraphs because ScanBizCards reduces larger images so have to feed it business-card sized images to avoid the reducing) - and 3.01 produced spotless results - even for a s+e mangled a bit by your image processing! Pretty awesome actually. So I don't know what's wrong with your setup. Just to make it the same as ScanBizCards, want to try removing the cube training data (we don't include it)?
Here are the images I used and the results so you can see for yourself: scanbizcards dot com /samples/part1.jpg scanbizcards dot com /samples/part1text.jpg scanbizcards dot com /samples/part2.jpg scanbizcards dot com /samples/part2text.jpg Note that the image processing you used damaged some letters a bit (but that's a given in your test of both versions). Your image is just the right size for Tesseract, text is large enough. Patrick On Fri, Dec 16, 2011 at 8:46 AM, Alasdair <[email protected]> wrote: > The board doesn't seem to allow URLs. > > The image is here: > imageshack dot us /photo/my-images/12/21904105.png > > Alasdair > > > On Dec 16, 12:06 am, Patrick Questembert > <[email protected]> wrote: > > One possible difference is that we don't use the Cube data at all. Can > you > > share the image? > > > > If you have an iPhone or Android phone you are welcome to try our use of > > Tesseract 3.01 on your own by simply snapping a photo and trying > > ScanBizCards on it (it's free, no worries). If you want to see the > > Tesseract 3.01 results without our additional corrections [iPhone only], > do > > this: > > - touch Settings on the app's main page > > - enter this string in the Configuration field: showalltext=2 > > - now when you scan, the actions menu for that photo will include "Show > All > > Text" as the first option and will show you the text + copy it to the > > clipboard > > > > Note that my own experience is in the context of feeding Tesseract a > black > > & white image we prepare. In theory it's possible that with Tesseract > > default image processing (which is rather weak) 3.01 performs less well. > > > > Patrick > > > > > > > > > > > > > > > > On Thu, Dec 15, 2011 at 4:01 AM, Alasdair <[email protected]> wrote: > > > I just tested again. Reinstalled 3.01, tested on a couple of images, > > > Reinstalled 3.00, tested on a couple of images. > > > > > Tesseract 3.00 wins again. I'm certain that I am using the correct > > > training data, and I have the cube data as well for 3.01. I cleared > > > out all the files between installations, so 3.01 only has access to > > > 3.01 data and 3.00 only has access to 3.00 data. > > > > > 3.01: > > > > > LTARMER MEANWELL was at one time a very rich > > > man. He owned large ï¬elds, and had ï¬ne ï¬ocks of > > > sheep, and plenty of money. > > > > > 3.00: > > > > > LTARMER MEANWELL was at one time a very rich > > > man. He owned large fields, and had fine flocks of > > > sheep, and plenty of money. > > > > > Note also that I'm on CentOS 6. Leptonica version 1.68. > > > > > I also set the environment variable: > > > export TESSDATA_PREFIX=/usr/local/share/tessdata > > > > > It's the same on all images I have tested. > > > > > Any ideas? > > > > > On Dec 14, 11:03 am, patrickq <[email protected]> wrote: > > > > I have had the opposite experience: Tess 3.01 beats 3.00 often - the > > > > reverse does happen but rarely. > > > > > > Note that Tess 3.01 will do WORSE if using Tess 3.00 trained data - > is > > > > it possible you are not using the Tess 3.01 trained data? > > > > > > On Dec 13, 9:22 am, Alasdair <[email protected]> wrote: > > > > > > > For some reason Tesseract 3.01 is giving much poorer accuracy than > > > > > 3.00 on exactly the same images. This is the same whether using the > > > > > 3.01 English trained data or the 3.00 English trained data. (I > would > > > > > expect it to at least be the same using the 3.00 trained data.) > > > > > > > Am I the only person experiencing this? > > > > > > > If so, what have I done wrong? > > > > > > > I'm executing it like this: > > > > > tesseract "test.tif" "testout" -l eng > > > > > -- > > > You received this message because you are subscribed to the Google > > > Groups "tesseract-ocr" group. > > > To post to this group, send email to [email protected] > > > To unsubscribe from this group, send email to > > > [email protected] > > > For more options, visit this group at > > >http://groups.google.com/group/tesseract-ocr?hl=en > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

