Hi gadv,
I have used SWT algorithm to do the pre-processing. You can find this
algorithm paper and supplementation by Google. Now the problem is the
hollow-style characters and broken strokes, which is very difficult to cope
with. Any good ideas?
在 2012年12月8日星期六UTC+8上午10时17分46秒,gadv写道:
>
> Hello Neo,
>
> how did you turn the original images to those results? What kind of
> image processing?
>
> Thanks
>
> On Thu, Dec 6, 2012 at 4:41 AM, Neo Song <[email protected] <javascript:>>
> wrote:
> > Dear SteveP and others,
> >
> > I have managed to do some image preprocessing and now I have got the
> > attachment images to send to tesseract. However they are all
> hollow-style
> > characters, which is very bad for OCR. How can I transfer them to
> > solid-style characters?
> >
> > 在 2012年12月6日星期四UTC+8上午5时28分28秒,SteveP写道:
> >>
> >> If your characters have a fixed size in terms of pixels, then you might
> >> get better results from doing a subimage search than by using OCR. I
> mean
> >> searching for a rectangular subimage within the image of the card. The
> >> subimages that you could use would be reference images of each digit
> 0-9.
> >>
> >> You would probably need to do some image processing first to convert
> the
> >> images to black and white. Let me know if you need ideas.
> >>
> >> On Thu, Nov 22, 2012 at 9:04 AM, Tom Morris <[email protected]> wrote:
> >>>
> >>> Embossed cards are designed to be printed. Have you considered taking
> an
> >>> impression and scanning the impression? Or just scanning
> (magnetically) the
> >>> magnetic strip on the card?
> >>>
> >>> There have been other discussions of training Tesseract for OCR-A (and
> I
> >>> think OCR-B). Farrington 7B is another in that set of OCRable fonts,
> so the
> >>> process should be the same.
> >>>
> >>> Tom
> >>>
> >>>
> >>> On Wednesday, November 21, 2012 9:41:02 AM UTC-5, Neo Song wrote:
> >>>>
> >>>> Thank you for your reply!
> >>>> And since the bank card embossing characters are designed to be
> >>>> OCR-able(according to the ISO 7811 spec), why there is no
> implementation
> >>>> examples available on the internet? And there is no similar problem
> in
> >>>> tesseract forum either. I have searched for a lot, but I find
> nothing.
> >>>> This problem should be an easy one or not?
> >>>>
> >>>> 在 2012年11月20日星期二UTC+8下午9时45分13秒,TP写道:
> >>>>>
> >>>>> On Mon, Nov 19, 2012 at 9:07 AM, Neo Song <[email protected]>
> wrote:
> >>>>>>
> >>>>>> Dear All,
> >>>>>> I am now needing to OCR the embossing characters on the bank
> card.
> >>>>>> These characters are in two kind of font. The first one is
> Farrington 7B,
> >>>>>> which is used to present the account number, and another font is
> >>>>>> unknown(maybe bank-dependent) and is used to present card holder
> name, card
> >>>>>> issue time and card serial number.
> >>>>>> Now the problem is the embossing characters are very difficult
> to
> >>>>>> OCR since they will be very bright under special light. While if
> the extra
> >>>>>> light is not applied, the card background will largely affect these
> >>>>>> characters, and will cause error.
> >>>>>> I have uploaded two images. The first sample image shows that
> >>>>>> improper light applied will cause the characters to be dark/light
> mixed and
> >>>>>> OCR result is very bad. The second image shows that a better light
> will make
> >>>>>> the background dark and embossing characters very sharp, while the
> OCR
> >>>>>> result is a little bit better, but still not good enough.
> >>>>>> Can anybody give me some advice on the light applied, or image
> >>>>>> pre-processing technique to improve the OCR result? Thank you all!
> >>>>>
> >>>>>
> >>>>> Crazy (and expensive) idea:
> >>>>>
> >>>>> How about taking two or maybe four pictures of each card with the
> light
> >>>>> coming low from the side on the left and right (and maybe also from
> >>>>> top/bottom), then doing some sort of image processing combination?
> Hopefully
> >>>>> if the light is low enough the background will fade out and only the
> various
> >>>>> edges of the raised characters will be visible. Of course this
> would
> >>>>> require some special hardware and the ability to turn a different
> light on
> >>>>> for each scan.
> >>>
> >>> --
> >>> You received this message because you are subscribed to the Google
> >>> Groups "tesseract-ocr" group.
> >>> To post to this group, send email to [email protected]
> >>>
> >>> To unsubscribe from this group, send email to
> >>> [email protected]
> >>>
> >>> For more options, visit this group at
> >>> http://groups.google.com/group/tesseract-ocr?hl=en
> >>
> >>
> > --
> > You received this message because you are subscribed to the Google
> > Groups "tesseract-ocr" group.
> > To post to this group, send email to
> > [email protected]<javascript:>
> > To unsubscribe from this group, send email to
> > [email protected] <javascript:>
> > For more options, visit this group at
> > http://groups.google.com/group/tesseract-ocr?hl=en
>
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en