I think that multiple lights sources will help you to get a more homogeneus 
shadows, and prevent character break, is just an idea,

Regards



El lunes, 17 de diciembre de 2012 05:02:08 UTC+1, Neo Song escribió:
>
> Dear Dmitri,
>
>     There is one thing that confuses me heavily. For a Coaxial light 
> source, I can get solid stroke characters. For a Ring light source, I can 
> get hollow style stroke characters. For tesseract, it can recognize much 
> better on solid characters than hollow characters. However currently we 
> have very bad result when we use coaxial light on a light background card 
> or UV printing card, since both the characters and the background all very 
> bright. And the ring light source will yield much better result under such 
> situations.
>     Note that no matter the solid stroke or hollow stroke, we all have 
> break stroke, so just filling the hollow stroke to form hollow stroke is 
> very difficult to realize.
>     Can you give me some advice on this situation?
>
> 在 2012年12月13日星期四UTC+8下午7时24分59秒,Dmitri Silaev写道:
>>
>> Neo Song, 
>>
>> There are two usual approaches to problems like yours. The first one is 
>> to constrain the shooting conditions to mitigate further problems with the 
>> image processing. This could be indeed requiring the slip of a card (or 
>> something what others in the forum advise) and allowing everything that 
>> follows to go much easier. Another approach is to accept every possible 
>> image and struggle with all kinds of complications.
>>
>> If you choose the latter I can suggest the following. Your binarized 
>> images need some improvement. The binarization procedure has to save more 
>> contour pixels. Then you can use e.g. morphology (distance transform maybe) 
>> to obtain an image of one-pixel-wide contours - something like an edge map. 
>> In that way you'll be able to replace the first stage of the SWT - edge 
>> detection (implemented via Canny or whatever) - with your own. The rest of 
>> the SWT can go in its usual way. To generate a better edge image you can 
>> use various binarization techniques, taking note of that you actually 
>> binarize not characters themselves but those glowing halos around them. 
>> Then you can employ some kind of post-processing (maybe 3x3 pattern based 
>> linking) to make your contours more connected. But be prepared that no 
>> perfect character contours can be obtained, like with any other edge 
>> detection procedure.
>>
>> HTH and good luck!
>>
>> Warm regards, 
>> Dmitri Silaev 
>> www.CustomOCR.com
>>
>>
>>
>> On Thu, Dec 13, 2012 at 1:35 PM, Neo Song <[email protected]> wrote:
>>
>>> Hi gadv,
>>>
>>>     I have used SWT algorithm to do the pre-processing. You can find 
>>> this algorithm paper and supplementation by Google. Now the problem is the 
>>> hollow-style characters and broken strokes, which is very difficult to cope 
>>> with. Any good ideas?
>>>
>>> 在 2012年12月8日星期六UTC+8上午10时17分46秒,gadv写道:
>>>>
>>>> Hello Neo, 
>>>>
>>>> how did you turn the original images to those results? What kind of 
>>>> image processing? 
>>>>
>>>> Thanks 
>>>>
>>>> On Thu, Dec 6, 2012 at 4:41 AM, Neo Song <[email protected]> wrote: 
>>>> > Dear SteveP and others, 
>>>> > 
>>>> >     I have managed to do some image preprocessing and now I have got 
>>>> the 
>>>> > attachment images to send to tesseract. However they are all 
>>>> hollow-style 
>>>> > characters, which is very bad for OCR. How can I transfer them to 
>>>> > solid-style characters? 
>>>> > 
>>>> > 在 2012年12月6日星期四UTC+8上午5时28分28秒,**SteveP写道: 
>>>> >> 
>>>> >> If your characters have a fixed size in terms of pixels, then you 
>>>> might 
>>>> >> get better results from doing a subimage search than by using OCR. 
>>>>  I mean 
>>>> >> searching for a rectangular subimage within the image of the card. 
>>>>  The 
>>>> >> subimages that you could use would be reference images of each digit 
>>>> 0-9. 
>>>> >> 
>>>> >> You would probably need to do some image processing first to convert 
>>>> the 
>>>> >> images to black and white.  Let me know if you need ideas. 
>>>> >> 
>>>> >> On Thu, Nov 22, 2012 at 9:04 AM, Tom Morris <[email protected]> 
>>>> wrote: 
>>>> >>> 
>>>> >>> Embossed cards are designed to be printed.  Have you considered 
>>>> taking an 
>>>> >>> impression and scanning the impression?  Or just scanning 
>>>> (magnetically) the 
>>>> >>> magnetic strip on the card? 
>>>> >>> 
>>>> >>> There have been other discussions of training Tesseract for OCR-A 
>>>> (and I 
>>>> >>> think OCR-B).  Farrington 7B is another in that set of OCRable 
>>>> fonts, so the 
>>>> >>> process should be the same. 
>>>> >>> 
>>>> >>> Tom 
>>>> >>> 
>>>> >>> 
>>>> >>> On Wednesday, November 21, 2012 9:41:02 AM UTC-5, Neo Song wrote: 
>>>> >>>> 
>>>> >>>> Thank you for your reply! 
>>>> >>>> And since the bank card embossing characters are designed to be 
>>>> >>>> OCR-able(according to the ISO 7811 spec), why there is no 
>>>> implementation 
>>>> >>>> examples available on the internet? And there is no similar 
>>>> problem in 
>>>> >>>> tesseract forum either. I have searched for a lot, but I find 
>>>> nothing. 
>>>> >>>> This problem should be an easy one or not? 
>>>> >>>> 
>>>> >>>> 在 2012年11月20日星期二UTC+8下午9时45分13秒,**TP写道: 
>>>> >>>>> 
>>>> >>>>> On Mon, Nov 19, 2012 at 9:07 AM, Neo Song <[email protected]> 
>>>> wrote: 
>>>> >>>>>> 
>>>> >>>>>> Dear All, 
>>>> >>>>>>     I am now needing to OCR the embossing characters on the bank 
>>>> card. 
>>>> >>>>>> These characters are in two kind of font. The first one is 
>>>> Farrington 7B, 
>>>> >>>>>> which is used to present the account number, and another font is 
>>>> >>>>>> unknown(maybe bank-dependent) and is used to present card holder 
>>>> name, card 
>>>> >>>>>> issue time and card serial number. 
>>>> >>>>>>     Now the problem is the embossing characters are very 
>>>> difficult to 
>>>> >>>>>> OCR since they will be very bright under special light. While if 
>>>> the extra 
>>>> >>>>>> light is not applied, the card background will largely affect 
>>>> these 
>>>> >>>>>> characters, and will cause error. 
>>>> >>>>>>     I have uploaded two images. The first sample image shows 
>>>> that 
>>>> >>>>>> improper light applied will cause the characters to be 
>>>> dark/light mixed and 
>>>> >>>>>> OCR result is very bad. The second image shows that a better 
>>>> light will make 
>>>> >>>>>> the background dark and embossing characters very sharp, while 
>>>> the OCR 
>>>> >>>>>> result is a little bit better, but still not good enough. 
>>>> >>>>>>     Can anybody give me some advice on the light applied, or 
>>>> image 
>>>> >>>>>> pre-processing technique to improve the OCR result? Thank you 
>>>> all! 
>>>> >>>>> 
>>>> >>>>> 
>>>> >>>>> Crazy (and expensive) idea: 
>>>> >>>>> 
>>>> >>>>> How about taking two or maybe four pictures of each card with the 
>>>> light 
>>>> >>>>> coming low from the side on the left and right (and maybe also 
>>>> from 
>>>> >>>>> top/bottom), then doing some sort of image processing 
>>>> combination? Hopefully 
>>>> >>>>> if the light is low enough the background will fade out and only 
>>>> the various 
>>>> >>>>> edges of the raised characters will be visible.  Of course this 
>>>> would 
>>>> >>>>> require some special hardware and the ability to turn a different 
>>>> light on 
>>>> >>>>> for each scan. 
>>>> >>> 
>>>> >>> -- 
>>>> >>> You received this message because you are subscribed to the Google 
>>>> >>> Groups "tesseract-ocr" group. 
>>>> >>> To post to this group, send email to [email protected] 
>>>> >>> 
>>>> >>> To unsubscribe from this group, send email to 
>>>> >>> tesseract-oc...@googlegroups.**com 
>>>> >>> 
>>>> >>> For more options, visit this group at 
>>>> >>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>> >>>  
>>>> >> 
>>>> >> 
>>>> > -- 
>>>> > You received this message because you are subscribed to the Google 
>>>> > Groups "tesseract-ocr" group. 
>>>> > To post to this group, send email to [email protected] 
>>>> > To unsubscribe from this group, send email to 
>>>> > tesseract-oc...@**googlegroups.com 
>>>> > For more options, visit this group at 
>>>> > http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>> >  
>>>>
>>>  -- 
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to [email protected]
>>> To unsubscribe from this group, send email to
>>> [email protected]
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to