Thanks for your response.  In the meantime, I read up on many of your 
answers in this group which will be very valuable to me. Thank you for 
sharing your expertise. 

I have in my incoming bitmap a single line of text (mostly numbers), so I 
don't need to worry about  most of the details you do. I believe the 
variances in accuracy occur due to different lighting/brightness conditions 
under which the picture is taken so sometimes the incoming grayscale image 
is overall darker, etc.  I've played around with contrast settings of the 
camera, and that helps as well, but am still trying to improve. 
Preprocessing to black and white was just one idea I had in mind. Needless 
to say, I'm fairly new to tesseract, so read up on possibilities of setting 
configuration parameters/variables which may help me. 

If you guys have nothing better to do, the "Show Processed Image" option 
would be nice to have in Android as well.  :)

Thanks again,
Tin


On Saturday, 12 January 2013 15:47:27 UTC+1, Patrick Questembert wrote:
>
> Yes, we convert the image to black and white, that's the end result of any 
> image processing for the purpose of OCR - Tesseract recognition has to work 
> on a black & white image so either your app creates the b&w image or you 
> let Tesseract create it. We had to create the b&w image because we found 
> Tesseract's image processing to be inadequate for most business card 
> images. Hard to fully describe all the details of our image processing but 
> it include stages like detecting areas where patterns are present (as 
> opposed to background), local adaptive thresholding and non-text 
> elimination (the later is something we don't do too well yet - but 
> Tesseract is usually not too confused by non-text patterns in the b&w 
> image). You can see the b&w image we generate by turning ON a setting of 
> ScanBizCards called "Show Processed Image" (available only in our iOS 
> version, not the Android version).
>
> Note also that in the results you see in ScanBizCards it's not just about 
> the b&w image: we also have 70,000 lines of code examining the text results 
> returned by Tesseract and comparing to our home-brewed OCR tests then 
> picking one or the other for each letter.
>
> Patrick
>
> On Sat, Jan 12, 2013 at 9:02 AM, Tin Siladin <[email protected]<javascript:>
> > wrote:
>
>> Patrick,
>> I ran into your answer and I think you might be able to help me given 
>> that your scanbizcards app is doing similar processing to what I need 
>> (don't worry, no competition :) ).
>>
>> I'm using tesseract-ocr on android, taking pictures of portions of 
>> receipts in grayscale mode (EFFECT_MONO, if available on device), and 
>> converting the bitmap to ARGB_888 for tesseract processing.  I'm getting 
>> pretty good results, but am trying to improve accuracy.
>>
>> Do you know whether you do any other preprocessing of the bitmap before 
>> passing to tesseract, like perhaps converting to black/white (you mentioned 
>> in your above answer you're passing "black & white image").  I tried out 
>> your app and it seems to give better results than mine on similar types of 
>> images.  Very nice app, BTW.
>>
>> Thanks,
>> Tin
>>
>> On Monday, 17 December 2012 04:02:34 UTC+1, Patrick Questembert wrote:
>>>
>>> I think you are right - ScanBizCards is passing a black & white image to 
>>> Tesseract and we are pretty sure Tesseract doesn't change the image 
>>> (empirically - we never dug in to make sure).
>>>
>>> Patrick
>>>
>>> On Sun, Dec 16, 2012 at 9:44 PM, Linda Li <[email protected]> wrote:
>>>
>>>> If I pass a binary (black-white) image into tesseract-ocr, will 
>>>> tesseract process the image on its own way?
>>>>
>>>>
>>>>  It seems tesseract ocr uses Otsu global thresholding method (not 
>>>> looking into the code carefull yet, but saw the key word “otsu” in the 
>>>> source code)
>>>>
>>>> If so, the Otsu method will not change the binary (black-white) image.
>>>>
>>>>
>>>>  So in my understanding, tesseract-ocr will not change the input 
>>>> binary image.
>>>>
>>>> Just ask to make sure...
>>>>
>>>>
>>>>  Thanks.
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To post to this group, send email to [email protected]
>>>> To unsubscribe from this group, send email to
>>>> tesseract-oc...@**googlegroups.com
>>>> For more options, visit this group at
>>>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>
>>>
>>>
>>>
>>> -- 
>>> Patrick Questembert, *ScanBizCards*
>>> +1-917-250-4177 | www.scanbizcards.com
>>> twitter.com/ScanBizCards | 
>>> www**.facebook.com/ScanBizCards<http://www.facebook.com/ScanBizCards>
>>> Just released: Power Contacts - http://itunes.apple.com/us/**
>>> app/power-contacts/**id476986356?mt=8<http://itunes.apple.com/us/app/power-contacts/id476986356?mt=8>
>>>
>>>   -- 
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]<javascript:>
>> To unsubscribe from this group, send email to
>> [email protected] <javascript:>
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>
>
>
> -- 
> Patrick Questembert, *ScanBizCards*
> +1-917-250-4177 | www.scanbizcards.com
> twitter.com/ScanBizCards | www.facebook.com/ScanBizCards
> Just released: Power Contacts - 
> http://itunes.apple.com/us/app/power-contacts/id476986356?mt=8
>  

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to