Hello Neo, how did you turn the original images to those results? What kind of image processing?
Thanks On Thu, Dec 6, 2012 at 4:41 AM, Neo Song <[email protected]> wrote: > Dear SteveP and others, > > I have managed to do some image preprocessing and now I have got the > attachment images to send to tesseract. However they are all hollow-style > characters, which is very bad for OCR. How can I transfer them to > solid-style characters? > > 在 2012年12月6日星期四UTC+8上午5时28分28秒,SteveP写道: >> >> If your characters have a fixed size in terms of pixels, then you might >> get better results from doing a subimage search than by using OCR. I mean >> searching for a rectangular subimage within the image of the card. The >> subimages that you could use would be reference images of each digit 0-9. >> >> You would probably need to do some image processing first to convert the >> images to black and white. Let me know if you need ideas. >> >> On Thu, Nov 22, 2012 at 9:04 AM, Tom Morris <[email protected]> wrote: >>> >>> Embossed cards are designed to be printed. Have you considered taking an >>> impression and scanning the impression? Or just scanning (magnetically) the >>> magnetic strip on the card? >>> >>> There have been other discussions of training Tesseract for OCR-A (and I >>> think OCR-B). Farrington 7B is another in that set of OCRable fonts, so the >>> process should be the same. >>> >>> Tom >>> >>> >>> On Wednesday, November 21, 2012 9:41:02 AM UTC-5, Neo Song wrote: >>>> >>>> Thank you for your reply! >>>> And since the bank card embossing characters are designed to be >>>> OCR-able(according to the ISO 7811 spec), why there is no implementation >>>> examples available on the internet? And there is no similar problem in >>>> tesseract forum either. I have searched for a lot, but I find nothing. >>>> This problem should be an easy one or not? >>>> >>>> 在 2012年11月20日星期二UTC+8下午9时45分13秒,TP写道: >>>>> >>>>> On Mon, Nov 19, 2012 at 9:07 AM, Neo Song <[email protected]> wrote: >>>>>> >>>>>> Dear All, >>>>>> I am now needing to OCR the embossing characters on the bank card. >>>>>> These characters are in two kind of font. The first one is Farrington 7B, >>>>>> which is used to present the account number, and another font is >>>>>> unknown(maybe bank-dependent) and is used to present card holder name, >>>>>> card >>>>>> issue time and card serial number. >>>>>> Now the problem is the embossing characters are very difficult to >>>>>> OCR since they will be very bright under special light. While if the >>>>>> extra >>>>>> light is not applied, the card background will largely affect these >>>>>> characters, and will cause error. >>>>>> I have uploaded two images. The first sample image shows that >>>>>> improper light applied will cause the characters to be dark/light mixed >>>>>> and >>>>>> OCR result is very bad. The second image shows that a better light will >>>>>> make >>>>>> the background dark and embossing characters very sharp, while the OCR >>>>>> result is a little bit better, but still not good enough. >>>>>> Can anybody give me some advice on the light applied, or image >>>>>> pre-processing technique to improve the OCR result? Thank you all! >>>>> >>>>> >>>>> Crazy (and expensive) idea: >>>>> >>>>> How about taking two or maybe four pictures of each card with the light >>>>> coming low from the side on the left and right (and maybe also from >>>>> top/bottom), then doing some sort of image processing combination? >>>>> Hopefully >>>>> if the light is low enough the background will fade out and only the >>>>> various >>>>> edges of the raised characters will be visible. Of course this would >>>>> require some special hardware and the ability to turn a different light on >>>>> for each scan. >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To post to this group, send email to [email protected] >>> >>> To unsubscribe from this group, send email to >>> [email protected] >>> >>> For more options, visit this group at >>> http://groups.google.com/group/tesseract-ocr?hl=en >> >> > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

