Hi gadv,
You can get the implementation here :
https://github.com/aperrau/DetectText
Good luck.
在 2012年12月15日星期六UTC+8下午10时38分31秒,gadv写道:
>
> Hello Neo,
>
> which SWT implementation did you use? There are several ones out there
> and I haven't found one that produces your result yet.
>
> Thanks
>
> On Thu, Dec 13, 2012 at 1:24 PM, Dmitri Silaev
> <[email protected]<javascript:>>
> wrote:
> > Neo Song,
> >
> > There are two usual approaches to problems like yours. The first one is
> to
> > constrain the shooting conditions to mitigate further problems with the
> > image processing. This could be indeed requiring the slip of a card (or
> > something what others in the forum advise) and allowing everything that
> > follows to go much easier. Another approach is to accept every possible
> > image and struggle with all kinds of complications.
> >
> > If you choose the latter I can suggest the following. Your binarized
> images
> > need some improvement. The binarization procedure has to save more
> contour
> > pixels. Then you can use e.g. morphology (distance transform maybe) to
> > obtain an image of one-pixel-wide contours - something like an edge map.
> In
> > that way you'll be able to replace the first stage of the SWT - edge
> > detection (implemented via Canny or whatever) - with your own. The rest
> of
> > the SWT can go in its usual way. To generate a better edge image you can
> use
> > various binarization techniques, taking note of that you actually
> binarize
> > not characters themselves but those glowing halos around them. Then you
> can
> > employ some kind of post-processing (maybe 3x3 pattern based linking) to
> > make your contours more connected. But be prepared that no perfect
> character
> > contours can be obtained, like with any other edge detection procedure.
> >
> > HTH and good luck!
> >
> > Warm regards,
> > Dmitri Silaev
> > www.CustomOCR.com
> >
> >
> >
> > On Thu, Dec 13, 2012 at 1:35 PM, Neo Song <[email protected]<javascript:>>
> wrote:
> >>
> >> Hi gadv,
> >>
> >> I have used SWT algorithm to do the pre-processing. You can find
> this
> >> algorithm paper and supplementation by Google. Now the problem is the
> >> hollow-style characters and broken strokes, which is very difficult to
> cope
> >> with. Any good ideas?
> >>
> >> 在 2012年12月8日星期六UTC+8上午10时17分46秒,gadv写道:
> >>>
> >>> Hello Neo,
> >>>
> >>> how did you turn the original images to those results? What kind of
> >>> image processing?
> >>>
> >>> Thanks
> >>>
> >>> On Thu, Dec 6, 2012 at 4:41 AM, Neo Song <[email protected]> wrote:
> >>> > Dear SteveP and others,
> >>> >
> >>> > I have managed to do some image preprocessing and now I have got
> >>> > the
> >>> > attachment images to send to tesseract. However they are all
> >>> > hollow-style
> >>> > characters, which is very bad for OCR. How can I transfer them to
> >>> > solid-style characters?
> >>> >
> >>> > 在 2012年12月6日星期四UTC+8上午5时28分28秒,SteveP写道:
> >>> >>
> >>> >> If your characters have a fixed size in terms of pixels, then you
> >>> >> might
> >>> >> get better results from doing a subimage search than by using OCR.
> I
> >>> >> mean
> >>> >> searching for a rectangular subimage within the image of the card.
> >>> >> The
> >>> >> subimages that you could use would be reference images of each
> digit
> >>> >> 0-9.
> >>> >>
> >>> >> You would probably need to do some image processing first to
> convert
> >>> >> the
> >>> >> images to black and white. Let me know if you need ideas.
> >>> >>
> >>> >> On Thu, Nov 22, 2012 at 9:04 AM, Tom Morris <[email protected]>
> wrote:
> >>> >>>
> >>> >>> Embossed cards are designed to be printed. Have you considered
> >>> >>> taking an
> >>> >>> impression and scanning the impression? Or just scanning
> >>> >>> (magnetically) the
> >>> >>> magnetic strip on the card?
> >>> >>>
> >>> >>> There have been other discussions of training Tesseract for OCR-A
> >>> >>> (and I
> >>> >>> think OCR-B). Farrington 7B is another in that set of OCRable
> fonts,
> >>> >>> so the
> >>> >>> process should be the same.
> >>> >>>
> >>> >>> Tom
> >>> >>>
> >>> >>>
> >>> >>> On Wednesday, November 21, 2012 9:41:02 AM UTC-5, Neo Song wrote:
> >>> >>>>
> >>> >>>> Thank you for your reply!
> >>> >>>> And since the bank card embossing characters are designed to be
> >>> >>>> OCR-able(according to the ISO 7811 spec), why there is no
> >>> >>>> implementation
> >>> >>>> examples available on the internet? And there is no similar
> problem
> >>> >>>> in
> >>> >>>> tesseract forum either. I have searched for a lot, but I find
> >>> >>>> nothing.
> >>> >>>> This problem should be an easy one or not?
> >>> >>>>
> >>> >>>> 在 2012年11月20日星期二UTC+8下午9时45分13秒,TP写道:
> >>> >>>>>
> >>> >>>>> On Mon, Nov 19, 2012 at 9:07 AM, Neo Song <[email protected]>
> >>> >>>>> wrote:
> >>> >>>>>>
> >>> >>>>>> Dear All,
> >>> >>>>>> I am now needing to OCR the embossing characters on the
> bank
> >>> >>>>>> card.
> >>> >>>>>> These characters are in two kind of font. The first one is
> >>> >>>>>> Farrington 7B,
> >>> >>>>>> which is used to present the account number, and another font
> is
> >>> >>>>>> unknown(maybe bank-dependent) and is used to present card
> holder
> >>> >>>>>> name, card
> >>> >>>>>> issue time and card serial number.
> >>> >>>>>> Now the problem is the embossing characters are very
> difficult
> >>> >>>>>> to
> >>> >>>>>> OCR since they will be very bright under special light. While
> if
> >>> >>>>>> the extra
> >>> >>>>>> light is not applied, the card background will largely affect
> >>> >>>>>> these
> >>> >>>>>> characters, and will cause error.
> >>> >>>>>> I have uploaded two images. The first sample image shows
> that
> >>> >>>>>> improper light applied will cause the characters to be
> dark/light
> >>> >>>>>> mixed and
> >>> >>>>>> OCR result is very bad. The second image shows that a better
> light
> >>> >>>>>> will make
> >>> >>>>>> the background dark and embossing characters very sharp, while
> the
> >>> >>>>>> OCR
> >>> >>>>>> result is a little bit better, but still not good enough.
> >>> >>>>>> Can anybody give me some advice on the light applied, or
> image
> >>> >>>>>> pre-processing technique to improve the OCR result? Thank you
> all!
> >>> >>>>>
> >>> >>>>>
> >>> >>>>> Crazy (and expensive) idea:
> >>> >>>>>
> >>> >>>>> How about taking two or maybe four pictures of each card with
> the
> >>> >>>>> light
> >>> >>>>> coming low from the side on the left and right (and maybe also
> from
> >>> >>>>> top/bottom), then doing some sort of image processing
> combination?
> >>> >>>>> Hopefully
> >>> >>>>> if the light is low enough the background will fade out and only
> >>> >>>>> the various
> >>> >>>>> edges of the raised characters will be visible. Of course this
> >>> >>>>> would
> >>> >>>>> require some special hardware and the ability to turn a
> different
> >>> >>>>> light on
> >>> >>>>> for each scan.
> >>> >>>
> >>> >>> --
> >>> >>> You received this message because you are subscribed to the Google
> >>> >>> Groups "tesseract-ocr" group.
> >>> >>> To post to this group, send email to [email protected]
> >>> >>>
> >>> >>> To unsubscribe from this group, send email to
> >>> >>> [email protected]
> >>> >>>
> >>> >>> For more options, visit this group at
> >>> >>> http://groups.google.com/group/tesseract-ocr?hl=en
> >>> >>
> >>> >>
> >>> > --
> >>> > You received this message because you are subscribed to the Google
> >>> > Groups "tesseract-ocr" group.
> >>> > To post to this group, send email to [email protected]
> >>> > To unsubscribe from this group, send email to
> >>> > [email protected]
> >>> > For more options, visit this group at
> >>> > http://groups.google.com/group/tesseract-ocr?hl=en
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> >> Groups "tesseract-ocr" group.
> >> To post to this group, send email to
> >> [email protected]<javascript:>
> >> To unsubscribe from this group, send email to
> >> [email protected] <javascript:>
> >> For more options, visit this group at
> >> http://groups.google.com/group/tesseract-ocr?hl=en
> >
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "tesseract-ocr" group.
> > To post to this group, send email to
> > [email protected]<javascript:>
> > To unsubscribe from this group, send email to
> > [email protected] <javascript:>
> > For more options, visit this group at
> > http://groups.google.com/group/tesseract-ocr?hl=en
>
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en