Hi Neo,

Sorry for the delayed answer. If you are serious about this problem and
ready to develop your own software, you may want to implement a custom
binarization procedure. I would suggest to take a look at MSERs (
http://en.wikipedia.org/wiki/Maximally_stable_extremal_regions). Simple
edge detection combined with some rules can be used to narrow the ROI. You
might also need to do some connected component analysis afterwards. Well, I
can't see any use from SWT in this kind of pipeline. I suppose the type of
the light source is not so important with this approach, at least you just
need to avoid any quirky configurations (shadows, extreme flares, etc.)

Warm regards,
Dmitri Silaev
www.CustomOCR.com


On Mon, Dec 17, 2012 at 8:02 AM, Neo Song <[email protected]> wrote:

> Dear Dmitri,
>
>     There is one thing that confuses me heavily. For a Coaxial light
> source, I can get solid stroke characters. For a Ring light source, I can
> get hollow style stroke characters. For tesseract, it can recognize much
> better on solid characters than hollow characters. However currently we
> have very bad result when we use coaxial light on a light background card
> or UV printing card, since both the characters and the background all very
> bright. And the ring light source will yield much better result under such
> situations.
>     Note that no matter the solid stroke or hollow stroke, we all have
> break stroke, so just filling the hollow stroke to form hollow stroke is
> very difficult to realize.
>     Can you give me some advice on this situation?
>
> 在 2012年12月13日星期四UTC+8下午7时24分59秒,Dmitri Silaev写道:
>>
>> Neo Song,
>>
>> There are two usual approaches to problems like yours. The first one is
>> to constrain the shooting conditions to mitigate further problems with the
>> image processing. This could be indeed requiring the slip of a card (or
>> something what others in the forum advise) and allowing everything that
>> follows to go much easier. Another approach is to accept every possible
>> image and struggle with all kinds of complications.
>>
>> If you choose the latter I can suggest the following. Your binarized
>> images need some improvement. The binarization procedure has to save more
>> contour pixels. Then you can use e.g. morphology (distance transform maybe)
>> to obtain an image of one-pixel-wide contours - something like an edge map.
>> In that way you'll be able to replace the first stage of the SWT - edge
>> detection (implemented via Canny or whatever) - with your own. The rest of
>> the SWT can go in its usual way. To generate a better edge image you can
>> use various binarization techniques, taking note of that you actually
>> binarize not characters themselves but those glowing halos around them.
>> Then you can employ some kind of post-processing (maybe 3x3 pattern based
>> linking) to make your contours more connected. But be prepared that no
>> perfect character contours can be obtained, like with any other edge
>> detection procedure.
>>
>> HTH and good luck!
>>
>> Warm regards,
>> Dmitri Silaev
>> www.CustomOCR.com
>>
>>
>>
>> On Thu, Dec 13, 2012 at 1:35 PM, Neo Song <[email protected]> wrote:
>>
>>> Hi gadv,
>>>
>>>     I have used SWT algorithm to do the pre-processing. You can find
>>> this algorithm paper and supplementation by Google. Now the problem is the
>>> hollow-style characters and broken strokes, which is very difficult to cope
>>> with. Any good ideas?
>>>
>>> 在 2012年12月8日星期六UTC+8上午10时17分46秒,**gadv写道:
>>>>
>>>> Hello Neo,
>>>>
>>>> how did you turn the original images to those results? What kind of
>>>> image processing?
>>>>
>>>> Thanks
>>>>
>>>> On Thu, Dec 6, 2012 at 4:41 AM, Neo Song <[email protected]> wrote:
>>>> > Dear SteveP and others,
>>>> >
>>>> >     I have managed to do some image preprocessing and now I have got
>>>> the
>>>> > attachment images to send to tesseract. However they are all
>>>> hollow-style
>>>> > characters, which is very bad for OCR. How can I transfer them to
>>>> > solid-style characters?
>>>> >
>>>> > 在 2012年12月6日星期四UTC+8上午5时28分28秒,**S**teveP写道:
>>>> >>
>>>> >> If your characters have a fixed size in terms of pixels, then you
>>>> might
>>>> >> get better results from doing a subimage search than by using OCR.
>>>>  I mean
>>>> >> searching for a rectangular subimage within the image of the card.
>>>>  The
>>>> >> subimages that you could use would be reference images of each digit
>>>> 0-9.
>>>> >>
>>>> >> You would probably need to do some image processing first to convert
>>>> the
>>>> >> images to black and white.  Let me know if you need ideas.
>>>> >>
>>>> >> On Thu, Nov 22, 2012 at 9:04 AM, Tom Morris <[email protected]>
>>>> wrote:
>>>> >>>
>>>> >>> Embossed cards are designed to be printed.  Have you considered
>>>> taking an
>>>> >>> impression and scanning the impression?  Or just scanning
>>>> (magnetically) the
>>>> >>> magnetic strip on the card?
>>>> >>>
>>>> >>> There have been other discussions of training Tesseract for OCR-A
>>>> (and I
>>>> >>> think OCR-B).  Farrington 7B is another in that set of OCRable
>>>> fonts, so the
>>>> >>> process should be the same.
>>>> >>>
>>>> >>> Tom
>>>> >>>
>>>> >>>
>>>> >>> On Wednesday, November 21, 2012 9:41:02 AM UTC-5, Neo Song wrote:
>>>> >>>>
>>>> >>>> Thank you for your reply!
>>>> >>>> And since the bank card embossing characters are designed to be
>>>> >>>> OCR-able(according to the ISO 7811 spec), why there is no
>>>> implementation
>>>> >>>> examples available on the internet? And there is no similar
>>>> problem in
>>>> >>>> tesseract forum either. I have searched for a lot, but I find
>>>> nothing.
>>>> >>>> This problem should be an easy one or not?
>>>> >>>>
>>>> >>>> 在 2012年11月20日星期二UTC+8下午9时45分13秒,****TP写道:
>>>> >>>>>
>>>> >>>>> On Mon, Nov 19, 2012 at 9:07 AM, Neo Song <[email protected]>
>>>> wrote:
>>>> >>>>>>
>>>> >>>>>> Dear All,
>>>> >>>>>>     I am now needing to OCR the embossing characters on the bank
>>>> card.
>>>> >>>>>> These characters are in two kind of font. The first one is
>>>> Farrington 7B,
>>>> >>>>>> which is used to present the account number, and another font is
>>>> >>>>>> unknown(maybe bank-dependent) and is used to present card holder
>>>> name, card
>>>> >>>>>> issue time and card serial number.
>>>> >>>>>>     Now the problem is the embossing characters are very
>>>> difficult to
>>>> >>>>>> OCR since they will be very bright under special light. While if
>>>> the extra
>>>> >>>>>> light is not applied, the card background will largely affect
>>>> these
>>>> >>>>>> characters, and will cause error.
>>>> >>>>>>     I have uploaded two images. The first sample image shows
>>>> that
>>>> >>>>>> improper light applied will cause the characters to be
>>>> dark/light mixed and
>>>> >>>>>> OCR result is very bad. The second image shows that a better
>>>> light will make
>>>> >>>>>> the background dark and embossing characters very sharp, while
>>>> the OCR
>>>> >>>>>> result is a little bit better, but still not good enough.
>>>> >>>>>>     Can anybody give me some advice on the light applied, or
>>>> image
>>>> >>>>>> pre-processing technique to improve the OCR result? Thank you
>>>> all!
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> Crazy (and expensive) idea:
>>>> >>>>>
>>>> >>>>> How about taking two or maybe four pictures of each card with the
>>>> light
>>>> >>>>> coming low from the side on the left and right (and maybe also
>>>> from
>>>> >>>>> top/bottom), then doing some sort of image processing
>>>> combination? Hopefully
>>>> >>>>> if the light is low enough the background will fade out and only
>>>> the various
>>>> >>>>> edges of the raised characters will be visible.  Of course this
>>>> would
>>>> >>>>> require some special hardware and the ability to turn a different
>>>> light on
>>>> >>>>> for each scan.
>>>> >>>
>>>> >>> --
>>>> >>> You received this message because you are subscribed to the Google
>>>> >>> Groups "tesseract-ocr" group.
>>>> >>> To post to this group, send email to [email protected]
>>>> >>>
>>>> >>> To unsubscribe from this group, send email to
>>>> >>> tesseract-oc...@googlegroups.**c**om
>>>> >>>
>>>> >>> For more options, visit this group at
>>>> >>> http://groups.google.com/**group**/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>> >>
>>>> >>
>>>> > --
>>>> > You received this message because you are subscribed to the Google
>>>> > Groups "tesseract-ocr" group.
>>>> > To post to this group, send email to [email protected]
>>>> > To unsubscribe from this group, send email to
>>>> > tesseract-oc...@**googlegroups.**com
>>>> > For more options, visit this group at
>>>> > http://groups.google.com/**group**/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to [email protected]
>>> To unsubscribe from this group, send email to
>>> tesseract-oc...@**googlegroups.com
>>> For more options, visit this group at
>>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>
>>
>>  --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to