Yes, we convert the image to black and white, that's the end result of any image processing for the purpose of OCR - Tesseract recognition has to work on a black & white image so either your app creates the b&w image or you let Tesseract create it. We had to create the b&w image because we found Tesseract's image processing to be inadequate for most business card images. Hard to fully describe all the details of our image processing but it include stages like detecting areas where patterns are present (as opposed to background), local adaptive thresholding and non-text elimination (the later is something we don't do too well yet - but Tesseract is usually not too confused by non-text patterns in the b&w image). You can see the b&w image we generate by turning ON a setting of ScanBizCards called "Show Processed Image" (available only in our iOS version, not the Android version).
Note also that in the results you see in ScanBizCards it's not just about the b&w image: we also have 70,000 lines of code examining the text results returned by Tesseract and comparing to our home-brewed OCR tests then picking one or the other for each letter. Patrick On Sat, Jan 12, 2013 at 9:02 AM, Tin Siladin <[email protected]> wrote: > Patrick, > I ran into your answer and I think you might be able to help me given that > your scanbizcards app is doing similar processing to what I need (don't > worry, no competition :) ). > > I'm using tesseract-ocr on android, taking pictures of portions of > receipts in grayscale mode (EFFECT_MONO, if available on device), and > converting the bitmap to ARGB_888 for tesseract processing. I'm getting > pretty good results, but am trying to improve accuracy. > > Do you know whether you do any other preprocessing of the bitmap before > passing to tesseract, like perhaps converting to black/white (you mentioned > in your above answer you're passing "black & white image"). I tried out > your app and it seems to give better results than mine on similar types of > images. Very nice app, BTW. > > Thanks, > Tin > > On Monday, 17 December 2012 04:02:34 UTC+1, Patrick Questembert wrote: >> >> I think you are right - ScanBizCards is passing a black & white image to >> Tesseract and we are pretty sure Tesseract doesn't change the image >> (empirically - we never dug in to make sure). >> >> Patrick >> >> On Sun, Dec 16, 2012 at 9:44 PM, Linda Li <[email protected]> wrote: >> >>> If I pass a binary (black-white) image into tesseract-ocr, will >>> tesseract process the image on its own way? >>> >>> >>> It seems tesseract ocr uses Otsu global thresholding method (not >>> looking into the code carefull yet, but saw the key word “otsu” in the >>> source code) >>> >>> If so, the Otsu method will not change the binary (black-white) image. >>> >>> >>> So in my understanding, tesseract-ocr will not change the input binary >>> image. >>> >>> Just ask to make sure... >>> >>> >>> Thanks. >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To post to this group, send email to [email protected] >>> To unsubscribe from this group, send email to >>> tesseract-oc...@**googlegroups.com >>> For more options, visit this group at >>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en> >>> >> >> >> >> -- >> Patrick Questembert, *ScanBizCards* >> +1-917-250-4177 | www.scanbizcards.com >> twitter.com/ScanBizCards | >> www**.facebook.com/ScanBizCards<http://www.facebook.com/ScanBizCards> >> Just released: Power Contacts - http://itunes.apple.com/us/** >> app/power-contacts/**id476986356?mt=8<http://itunes.apple.com/us/app/power-contacts/id476986356?mt=8> >> >> -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- Patrick Questembert, *ScanBizCards* +1-917-250-4177 | www.scanbizcards.com twitter.com/ScanBizCards | www.facebook.com/ScanBizCards Just released: Power Contacts - http://itunes.apple.com/us/app/power-contacts/id476986356?mt=8 -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

