Hi gadv,

    I have used SWT algorithm to do the pre-processing. You can find this 
algorithm paper and supplementation by Google. Now the problem is the 
hollow-style characters and broken strokes, which is very difficult to cope 
with. Any good ideas?

在 2012年12月8日星期六UTC+8上午10时17分46秒,gadv写道:
>
> Hello Neo, 
>
> how did you turn the original images to those results? What kind of 
> image processing? 
>
> Thanks 
>
> On Thu, Dec 6, 2012 at 4:41 AM, Neo Song <[email protected] <javascript:>> 
> wrote: 
> > Dear SteveP and others, 
> > 
> >     I have managed to do some image preprocessing and now I have got the 
> > attachment images to send to tesseract. However they are all 
> hollow-style 
> > characters, which is very bad for OCR. How can I transfer them to 
> > solid-style characters? 
> > 
> > 在 2012年12月6日星期四UTC+8上午5时28分28秒,SteveP写道: 
> >> 
> >> If your characters have a fixed size in terms of pixels, then you might 
> >> get better results from doing a subimage search than by using OCR.  I 
> mean 
> >> searching for a rectangular subimage within the image of the card.  The 
> >> subimages that you could use would be reference images of each digit 
> 0-9. 
> >> 
> >> You would probably need to do some image processing first to convert 
> the 
> >> images to black and white.  Let me know if you need ideas. 
> >> 
> >> On Thu, Nov 22, 2012 at 9:04 AM, Tom Morris <[email protected]> wrote: 
> >>> 
> >>> Embossed cards are designed to be printed.  Have you considered taking 
> an 
> >>> impression and scanning the impression?  Or just scanning 
> (magnetically) the 
> >>> magnetic strip on the card? 
> >>> 
> >>> There have been other discussions of training Tesseract for OCR-A (and 
> I 
> >>> think OCR-B).  Farrington 7B is another in that set of OCRable fonts, 
> so the 
> >>> process should be the same. 
> >>> 
> >>> Tom 
> >>> 
> >>> 
> >>> On Wednesday, November 21, 2012 9:41:02 AM UTC-5, Neo Song wrote: 
> >>>> 
> >>>> Thank you for your reply! 
> >>>> And since the bank card embossing characters are designed to be 
> >>>> OCR-able(according to the ISO 7811 spec), why there is no 
> implementation 
> >>>> examples available on the internet? And there is no similar problem 
> in 
> >>>> tesseract forum either. I have searched for a lot, but I find 
> nothing. 
> >>>> This problem should be an easy one or not? 
> >>>> 
> >>>> 在 2012年11月20日星期二UTC+8下午9时45分13秒,TP写道: 
> >>>>> 
> >>>>> On Mon, Nov 19, 2012 at 9:07 AM, Neo Song <[email protected]> 
> wrote: 
> >>>>>> 
> >>>>>> Dear All, 
> >>>>>>     I am now needing to OCR the embossing characters on the bank 
> card. 
> >>>>>> These characters are in two kind of font. The first one is 
> Farrington 7B, 
> >>>>>> which is used to present the account number, and another font is 
> >>>>>> unknown(maybe bank-dependent) and is used to present card holder 
> name, card 
> >>>>>> issue time and card serial number. 
> >>>>>>     Now the problem is the embossing characters are very difficult 
> to 
> >>>>>> OCR since they will be very bright under special light. While if 
> the extra 
> >>>>>> light is not applied, the card background will largely affect these 
> >>>>>> characters, and will cause error. 
> >>>>>>     I have uploaded two images. The first sample image shows that 
> >>>>>> improper light applied will cause the characters to be dark/light 
> mixed and 
> >>>>>> OCR result is very bad. The second image shows that a better light 
> will make 
> >>>>>> the background dark and embossing characters very sharp, while the 
> OCR 
> >>>>>> result is a little bit better, but still not good enough. 
> >>>>>>     Can anybody give me some advice on the light applied, or image 
> >>>>>> pre-processing technique to improve the OCR result? Thank you all! 
> >>>>> 
> >>>>> 
> >>>>> Crazy (and expensive) idea: 
> >>>>> 
> >>>>> How about taking two or maybe four pictures of each card with the 
> light 
> >>>>> coming low from the side on the left and right (and maybe also from 
> >>>>> top/bottom), then doing some sort of image processing combination? 
> Hopefully 
> >>>>> if the light is low enough the background will fade out and only the 
> various 
> >>>>> edges of the raised characters will be visible.  Of course this 
> would 
> >>>>> require some special hardware and the ability to turn a different 
> light on 
> >>>>> for each scan. 
> >>> 
> >>> -- 
> >>> You received this message because you are subscribed to the Google 
> >>> Groups "tesseract-ocr" group. 
> >>> To post to this group, send email to [email protected] 
> >>> 
> >>> To unsubscribe from this group, send email to 
> >>> [email protected] 
> >>> 
> >>> For more options, visit this group at 
> >>> http://groups.google.com/group/tesseract-ocr?hl=en 
> >> 
> >> 
> > -- 
> > You received this message because you are subscribed to the Google 
> > Groups "tesseract-ocr" group. 
> > To post to this group, send email to 
> > [email protected]<javascript:> 
> > To unsubscribe from this group, send email to 
> > [email protected] <javascript:> 
> > For more options, visit this group at 
> > http://groups.google.com/group/tesseract-ocr?hl=en 
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to