Many people get good results using the ImageMagick library/command line
tool for binarization and other pre-processing. It's free and you can use
the library for commercial products (Apache 2.0 license like Tesseract).
--Sven


On Sat, Jan 12, 2013 at 9:16 AM, Tin Siladin <[email protected]> wrote:

> Thanks for your response.  In the meantime, I read up on many of your
> answers in this group which will be very valuable to me. Thank you for
> sharing your expertise.
>
> I have in my incoming bitmap a single line of text (mostly numbers), so I
> don't need to worry about  most of the details you do. I believe the
> variances in accuracy occur due to different lighting/brightness conditions
> under which the picture is taken so sometimes the incoming grayscale image
> is overall darker, etc.  I've played around with contrast settings of the
> camera, and that helps as well, but am still trying to improve.
> Preprocessing to black and white was just one idea I had in mind. Needless
> to say, I'm fairly new to tesseract, so read up on possibilities of setting
> configuration parameters/variables which may help me.
>
> If you guys have nothing better to do, the "Show Processed Image" option
> would be nice to have in Android as well.  :)
>
> Thanks again,
> Tin
>
>
> On Saturday, 12 January 2013 15:47:27 UTC+1, Patrick Questembert wrote:
>
>> Yes, we convert the image to black and white, that's the end result of
>> any image processing for the purpose of OCR - Tesseract recognition has to
>> work on a black & white image so either your app creates the b&w image or
>> you let Tesseract create it. We had to create the b&w image because we
>> found Tesseract's image processing to be inadequate for most business card
>> images. Hard to fully describe all the details of our image processing but
>> it include stages like detecting areas where patterns are present (as
>> opposed to background), local adaptive thresholding and non-text
>> elimination (the later is something we don't do too well yet - but
>> Tesseract is usually not too confused by non-text patterns in the b&w
>> image). You can see the b&w image we generate by turning ON a setting of
>> ScanBizCards called "Show Processed Image" (available only in our iOS
>> version, not the Android version).
>>
>> Note also that in the results you see in ScanBizCards it's not just about
>> the b&w image: we also have 70,000 lines of code examining the text results
>> returned by Tesseract and comparing to our home-brewed OCR tests then
>> picking one or the other for each letter.
>>
>> Patrick
>>
>>
>> On Sat, Jan 12, 2013 at 9:02 AM, Tin Siladin <[email protected]> wrote:
>>
>>> Patrick,
>>> I ran into your answer and I think you might be able to help me given
>>> that your scanbizcards app is doing similar processing to what I need
>>> (don't worry, no competition :) ).
>>>
>>> I'm using tesseract-ocr on android, taking pictures of portions of
>>> receipts in grayscale mode (EFFECT_MONO, if available on device), and
>>> converting the bitmap to ARGB_888 for tesseract processing.  I'm getting
>>> pretty good results, but am trying to improve accuracy.
>>>
>>> Do you know whether you do any other preprocessing of the bitmap before
>>> passing to tesseract, like perhaps converting to black/white (you mentioned
>>> in your above answer you're passing "black & white image").  I tried out
>>> your app and it seems to give better results than mine on similar types of
>>> images.  Very nice app, BTW.
>>>
>>> Thanks,
>>> Tin
>>>
>>> On Monday, 17 December 2012 04:02:34 UTC+1, Patrick Questembert wrote:
>>>>
>>>> I think you are right - ScanBizCards is passing a black & white image
>>>> to Tesseract and we are pretty sure Tesseract doesn't change the image
>>>> (empirically - we never dug in to make sure).
>>>>
>>>> Patrick
>>>>
>>>> On Sun, Dec 16, 2012 at 9:44 PM, Linda Li <[email protected]>wrote:
>>>>
>>>>> If I pass a binary (black-white) image into tesseract-ocr, will
>>>>> tesseract process the image on its own way?
>>>>>
>>>>>
>>>>>  It seems tesseract ocr uses Otsu global thresholding method (not
>>>>> looking into the code carefull yet, but saw the key word “otsu” in the
>>>>> source code)
>>>>>
>>>>> If so, the Otsu method will not change the binary (black-white) image.
>>>>>
>>>>>
>>>>>  So in my understanding, tesseract-ocr will not change the input
>>>>> binary image.
>>>>>
>>>>> Just ask to make sure...
>>>>>
>>>>>
>>>>>  Thanks.
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To post to this group, send email to [email protected]
>>>>> To unsubscribe from this group, send email to
>>>>> tesseract-oc...@**googlegroups.**com
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/**group**/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Patrick Questembert, *ScanBizCards*
>>>> +1-917-250-4177 | www.scanbizcards.com
>>>> twitter.com/ScanBizCards | 
>>>> www****.facebook.com/ScanBizCards<http://www.facebook.com/ScanBizCards>
>>>> Just released: Power Contacts - http://itunes.apple.com/us/**a**
>>>> pp/power-contacts/**id476986356?**mt=8<http://itunes.apple.com/us/app/power-contacts/id476986356?mt=8>
>>>>
>>>>   --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to [email protected]
>>> To unsubscribe from this group, send email to
>>> tesseract-oc...@**googlegroups.com
>>> For more options, visit this group at
>>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>
>>
>>
>>
>> --
>> Patrick Questembert, *ScanBizCards*
>> +1-917-250-4177 | www.scanbizcards.com
>> twitter.com/ScanBizCards | 
>> www**.facebook.com/ScanBizCards<http://www.facebook.com/ScanBizCards>
>> Just released: Power Contacts - http://itunes.apple.com/us/**
>> app/power-contacts/**id476986356?mt=8<http://itunes.apple.com/us/app/power-contacts/id476986356?mt=8>
>>
>  --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>



-- 
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
>From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to