Personally I think adjusting contrast settings does not help much.
It needs special algorithm to get a good black-white image.
In the literature, it is related to document binarization. You could google 
search to get some papers to look at.

On Saturday, January 12, 2013 9:16:45 AM UTC-6, Tin Siladin wrote:
>
> Thanks for your response.  In the meantime, I read up on many of your 
> answers in this group which will be very valuable to me. Thank you for 
> sharing your expertise. 
>
> I have in my incoming bitmap a single line of text (mostly numbers), so I 
> don't need to worry about  most of the details you do. I believe the 
> variances in accuracy occur due to different lighting/brightness conditions 
> under which the picture is taken so sometimes the incoming grayscale image 
> is overall darker, etc.  I've played around with contrast settings of the 
> camera, and that helps as well, but am still trying to improve. 
> Preprocessing to black and white was just one idea I had in mind. Needless 
> to say, I'm fairly new to tesseract, so read up on possibilities of setting 
> configuration parameters/variables which may help me. 
>
> If you guys have nothing better to do, the "Show Processed Image" option 
> would be nice to have in Android as well.  :)
>
> Thanks again,
> Tin
>
>
> On Saturday, 12 January 2013 15:47:27 UTC+1, Patrick Questembert wrote:
>>
>> Yes, we convert the image to black and white, that's the end result of 
>> any image processing for the purpose of OCR - Tesseract recognition has to 
>> work on a black & white image so either your app creates the b&w image or 
>> you let Tesseract create it. We had to create the b&w image because we 
>> found Tesseract's image processing to be inadequate for most business card 
>> images. Hard to fully describe all the details of our image processing but 
>> it include stages like detecting areas where patterns are present (as 
>> opposed to background), local adaptive thresholding and non-text 
>> elimination (the later is something we don't do too well yet - but 
>> Tesseract is usually not too confused by non-text patterns in the b&w 
>> image). You can see the b&w image we generate by turning ON a setting of 
>> ScanBizCards called "Show Processed Image" (available only in our iOS 
>> version, not the Android version).
>>
>> Note also that in the results you see in ScanBizCards it's not just about 
>> the b&w image: we also have 70,000 lines of code examining the text results 
>> returned by Tesseract and comparing to our home-brewed OCR tests then 
>> picking one or the other for each letter.
>>
>> Patrick
>>
>> On Sat, Jan 12, 2013 at 9:02 AM, Tin Siladin <[email protected]> wrote:
>>
>>> Patrick,
>>> I ran into your answer and I think you might be able to help me given 
>>> that your scanbizcards app is doing similar processing to what I need 
>>> (don't worry, no competition :) ).
>>>
>>> I'm using tesseract-ocr on android, taking pictures of portions of 
>>> receipts in grayscale mode (EFFECT_MONO, if available on device), and 
>>> converting the bitmap to ARGB_888 for tesseract processing.  I'm getting 
>>> pretty good results, but am trying to improve accuracy.
>>>
>>> Do you know whether you do any other preprocessing of the bitmap before 
>>> passing to tesseract, like perhaps converting to black/white (you mentioned 
>>> in your above answer you're passing "black & white image").  I tried out 
>>> your app and it seems to give better results than mine on similar types of 
>>> images.  Very nice app, BTW.
>>>
>>> Thanks,
>>> Tin
>>>
>>> On Monday, 17 December 2012 04:02:34 UTC+1, Patrick Questembert wrote:
>>>>
>>>> I think you are right - ScanBizCards is passing a black & white image 
>>>> to Tesseract and we are pretty sure Tesseract doesn't change the image 
>>>> (empirically - we never dug in to make sure).
>>>>
>>>> Patrick
>>>>
>>>> On Sun, Dec 16, 2012 at 9:44 PM, Linda Li <[email protected]>wrote:
>>>>
>>>>> If I pass a binary (black-white) image into tesseract-ocr, will 
>>>>> tesseract process the image on its own way?
>>>>>
>>>>>
>>>>>  It seems tesseract ocr uses Otsu global thresholding method (not 
>>>>> looking into the code carefull yet, but saw the key word “otsu” in the 
>>>>> source code)
>>>>>
>>>>> If so, the Otsu method will not change the binary (black-white) image.
>>>>>
>>>>>
>>>>>  So in my understanding, tesseract-ocr will not change the input 
>>>>> binary image.
>>>>>
>>>>> Just ask to make sure...
>>>>>
>>>>>
>>>>>  Thanks.
>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To post to this group, send email to [email protected]
>>>>> To unsubscribe from this group, send email to
>>>>> tesseract-oc...@**googlegroups.com
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>> Patrick Questembert, *ScanBizCards*
>>>> +1-917-250-4177 | www.scanbizcards.com
>>>> twitter.com/ScanBizCards | 
>>>> www**.facebook.com/ScanBizCards<http://www.facebook.com/ScanBizCards>
>>>> Just released: Power Contacts - http://itunes.apple.com/us/**
>>>> app/power-contacts/**id476986356?mt=8<http://itunes.apple.com/us/app/power-contacts/id476986356?mt=8>
>>>>
>>>>   -- 
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to [email protected]
>>> To unsubscribe from this group, send email to
>>> [email protected]
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>
>>
>>
>>
>> -- 
>> Patrick Questembert, *ScanBizCards*
>> +1-917-250-4177 | www.scanbizcards.com
>> twitter.com/ScanBizCards | www.facebook.com/ScanBizCards
>> Just released: Power Contacts - 
>> http://itunes.apple.com/us/app/power-contacts/id476986356?mt=8
>>  
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to