Hello Neo,

which SWT implementation did you use? There are several ones out there
and I haven't found one that produces your result yet.

Thanks

On Thu, Dec 13, 2012 at 1:24 PM, Dmitri Silaev <[email protected]> wrote:
> Neo Song,
>
> There are two usual approaches to problems like yours. The first one is to
> constrain the shooting conditions to mitigate further problems with the
> image processing. This could be indeed requiring the slip of a card (or
> something what others in the forum advise) and allowing everything that
> follows to go much easier. Another approach is to accept every possible
> image and struggle with all kinds of complications.
>
> If you choose the latter I can suggest the following. Your binarized images
> need some improvement. The binarization procedure has to save more contour
> pixels. Then you can use e.g. morphology (distance transform maybe) to
> obtain an image of one-pixel-wide contours - something like an edge map. In
> that way you'll be able to replace the first stage of the SWT - edge
> detection (implemented via Canny or whatever) - with your own. The rest of
> the SWT can go in its usual way. To generate a better edge image you can use
> various binarization techniques, taking note of that you actually binarize
> not characters themselves but those glowing halos around them. Then you can
> employ some kind of post-processing (maybe 3x3 pattern based linking) to
> make your contours more connected. But be prepared that no perfect character
> contours can be obtained, like with any other edge detection procedure.
>
> HTH and good luck!
>
> Warm regards,
> Dmitri Silaev
> www.CustomOCR.com
>
>
>
> On Thu, Dec 13, 2012 at 1:35 PM, Neo Song <[email protected]> wrote:
>>
>> Hi gadv,
>>
>>     I have used SWT algorithm to do the pre-processing. You can find this
>> algorithm paper and supplementation by Google. Now the problem is the
>> hollow-style characters and broken strokes, which is very difficult to cope
>> with. Any good ideas?
>>
>> 在 2012年12月8日星期六UTC+8上午10时17分46秒,gadv写道:
>>>
>>> Hello Neo,
>>>
>>> how did you turn the original images to those results? What kind of
>>> image processing?
>>>
>>> Thanks
>>>
>>> On Thu, Dec 6, 2012 at 4:41 AM, Neo Song <[email protected]> wrote:
>>> > Dear SteveP and others,
>>> >
>>> >     I have managed to do some image preprocessing and now I have got
>>> > the
>>> > attachment images to send to tesseract. However they are all
>>> > hollow-style
>>> > characters, which is very bad for OCR. How can I transfer them to
>>> > solid-style characters?
>>> >
>>> > 在 2012年12月6日星期四UTC+8上午5时28分28秒,SteveP写道:
>>> >>
>>> >> If your characters have a fixed size in terms of pixels, then you
>>> >> might
>>> >> get better results from doing a subimage search than by using OCR.  I
>>> >> mean
>>> >> searching for a rectangular subimage within the image of the card.
>>> >> The
>>> >> subimages that you could use would be reference images of each digit
>>> >> 0-9.
>>> >>
>>> >> You would probably need to do some image processing first to convert
>>> >> the
>>> >> images to black and white.  Let me know if you need ideas.
>>> >>
>>> >> On Thu, Nov 22, 2012 at 9:04 AM, Tom Morris <[email protected]> wrote:
>>> >>>
>>> >>> Embossed cards are designed to be printed.  Have you considered
>>> >>> taking an
>>> >>> impression and scanning the impression?  Or just scanning
>>> >>> (magnetically) the
>>> >>> magnetic strip on the card?
>>> >>>
>>> >>> There have been other discussions of training Tesseract for OCR-A
>>> >>> (and I
>>> >>> think OCR-B).  Farrington 7B is another in that set of OCRable fonts,
>>> >>> so the
>>> >>> process should be the same.
>>> >>>
>>> >>> Tom
>>> >>>
>>> >>>
>>> >>> On Wednesday, November 21, 2012 9:41:02 AM UTC-5, Neo Song wrote:
>>> >>>>
>>> >>>> Thank you for your reply!
>>> >>>> And since the bank card embossing characters are designed to be
>>> >>>> OCR-able(according to the ISO 7811 spec), why there is no
>>> >>>> implementation
>>> >>>> examples available on the internet? And there is no similar problem
>>> >>>> in
>>> >>>> tesseract forum either. I have searched for a lot, but I find
>>> >>>> nothing.
>>> >>>> This problem should be an easy one or not?
>>> >>>>
>>> >>>> 在 2012年11月20日星期二UTC+8下午9时45分13秒,TP写道:
>>> >>>>>
>>> >>>>> On Mon, Nov 19, 2012 at 9:07 AM, Neo Song <[email protected]>
>>> >>>>> wrote:
>>> >>>>>>
>>> >>>>>> Dear All,
>>> >>>>>>     I am now needing to OCR the embossing characters on the bank
>>> >>>>>> card.
>>> >>>>>> These characters are in two kind of font. The first one is
>>> >>>>>> Farrington 7B,
>>> >>>>>> which is used to present the account number, and another font is
>>> >>>>>> unknown(maybe bank-dependent) and is used to present card holder
>>> >>>>>> name, card
>>> >>>>>> issue time and card serial number.
>>> >>>>>>     Now the problem is the embossing characters are very difficult
>>> >>>>>> to
>>> >>>>>> OCR since they will be very bright under special light. While if
>>> >>>>>> the extra
>>> >>>>>> light is not applied, the card background will largely affect
>>> >>>>>> these
>>> >>>>>> characters, and will cause error.
>>> >>>>>>     I have uploaded two images. The first sample image shows that
>>> >>>>>> improper light applied will cause the characters to be dark/light
>>> >>>>>> mixed and
>>> >>>>>> OCR result is very bad. The second image shows that a better light
>>> >>>>>> will make
>>> >>>>>> the background dark and embossing characters very sharp, while the
>>> >>>>>> OCR
>>> >>>>>> result is a little bit better, but still not good enough.
>>> >>>>>>     Can anybody give me some advice on the light applied, or image
>>> >>>>>> pre-processing technique to improve the OCR result? Thank you all!
>>> >>>>>
>>> >>>>>
>>> >>>>> Crazy (and expensive) idea:
>>> >>>>>
>>> >>>>> How about taking two or maybe four pictures of each card with the
>>> >>>>> light
>>> >>>>> coming low from the side on the left and right (and maybe also from
>>> >>>>> top/bottom), then doing some sort of image processing combination?
>>> >>>>> Hopefully
>>> >>>>> if the light is low enough the background will fade out and only
>>> >>>>> the various
>>> >>>>> edges of the raised characters will be visible.  Of course this
>>> >>>>> would
>>> >>>>> require some special hardware and the ability to turn a different
>>> >>>>> light on
>>> >>>>> for each scan.
>>> >>>
>>> >>> --
>>> >>> You received this message because you are subscribed to the Google
>>> >>> Groups "tesseract-ocr" group.
>>> >>> To post to this group, send email to [email protected]
>>> >>>
>>> >>> To unsubscribe from this group, send email to
>>> >>> [email protected]
>>> >>>
>>> >>> For more options, visit this group at
>>> >>> http://groups.google.com/group/tesseract-ocr?hl=en
>>> >>
>>> >>
>>> > --
>>> > You received this message because you are subscribed to the Google
>>> > Groups "tesseract-ocr" group.
>>> > To post to this group, send email to [email protected]
>>> > To unsubscribe from this group, send email to
>>> > [email protected]
>>> > For more options, visit this group at
>>> > http://groups.google.com/group/tesseract-ocr?hl=en
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to