Hello Neo, which SWT implementation did you use? There are several ones out there and I haven't found one that produces your result yet.
Thanks On Thu, Dec 13, 2012 at 1:24 PM, Dmitri Silaev <[email protected]> wrote: > Neo Song, > > There are two usual approaches to problems like yours. The first one is to > constrain the shooting conditions to mitigate further problems with the > image processing. This could be indeed requiring the slip of a card (or > something what others in the forum advise) and allowing everything that > follows to go much easier. Another approach is to accept every possible > image and struggle with all kinds of complications. > > If you choose the latter I can suggest the following. Your binarized images > need some improvement. The binarization procedure has to save more contour > pixels. Then you can use e.g. morphology (distance transform maybe) to > obtain an image of one-pixel-wide contours - something like an edge map. In > that way you'll be able to replace the first stage of the SWT - edge > detection (implemented via Canny or whatever) - with your own. The rest of > the SWT can go in its usual way. To generate a better edge image you can use > various binarization techniques, taking note of that you actually binarize > not characters themselves but those glowing halos around them. Then you can > employ some kind of post-processing (maybe 3x3 pattern based linking) to > make your contours more connected. But be prepared that no perfect character > contours can be obtained, like with any other edge detection procedure. > > HTH and good luck! > > Warm regards, > Dmitri Silaev > www.CustomOCR.com > > > > On Thu, Dec 13, 2012 at 1:35 PM, Neo Song <[email protected]> wrote: >> >> Hi gadv, >> >> I have used SWT algorithm to do the pre-processing. You can find this >> algorithm paper and supplementation by Google. Now the problem is the >> hollow-style characters and broken strokes, which is very difficult to cope >> with. Any good ideas? >> >> 在 2012年12月8日星期六UTC+8上午10时17分46秒,gadv写道: >>> >>> Hello Neo, >>> >>> how did you turn the original images to those results? What kind of >>> image processing? >>> >>> Thanks >>> >>> On Thu, Dec 6, 2012 at 4:41 AM, Neo Song <[email protected]> wrote: >>> > Dear SteveP and others, >>> > >>> > I have managed to do some image preprocessing and now I have got >>> > the >>> > attachment images to send to tesseract. However they are all >>> > hollow-style >>> > characters, which is very bad for OCR. How can I transfer them to >>> > solid-style characters? >>> > >>> > 在 2012年12月6日星期四UTC+8上午5时28分28秒,SteveP写道: >>> >> >>> >> If your characters have a fixed size in terms of pixels, then you >>> >> might >>> >> get better results from doing a subimage search than by using OCR. I >>> >> mean >>> >> searching for a rectangular subimage within the image of the card. >>> >> The >>> >> subimages that you could use would be reference images of each digit >>> >> 0-9. >>> >> >>> >> You would probably need to do some image processing first to convert >>> >> the >>> >> images to black and white. Let me know if you need ideas. >>> >> >>> >> On Thu, Nov 22, 2012 at 9:04 AM, Tom Morris <[email protected]> wrote: >>> >>> >>> >>> Embossed cards are designed to be printed. Have you considered >>> >>> taking an >>> >>> impression and scanning the impression? Or just scanning >>> >>> (magnetically) the >>> >>> magnetic strip on the card? >>> >>> >>> >>> There have been other discussions of training Tesseract for OCR-A >>> >>> (and I >>> >>> think OCR-B). Farrington 7B is another in that set of OCRable fonts, >>> >>> so the >>> >>> process should be the same. >>> >>> >>> >>> Tom >>> >>> >>> >>> >>> >>> On Wednesday, November 21, 2012 9:41:02 AM UTC-5, Neo Song wrote: >>> >>>> >>> >>>> Thank you for your reply! >>> >>>> And since the bank card embossing characters are designed to be >>> >>>> OCR-able(according to the ISO 7811 spec), why there is no >>> >>>> implementation >>> >>>> examples available on the internet? And there is no similar problem >>> >>>> in >>> >>>> tesseract forum either. I have searched for a lot, but I find >>> >>>> nothing. >>> >>>> This problem should be an easy one or not? >>> >>>> >>> >>>> 在 2012年11月20日星期二UTC+8下午9时45分13秒,TP写道: >>> >>>>> >>> >>>>> On Mon, Nov 19, 2012 at 9:07 AM, Neo Song <[email protected]> >>> >>>>> wrote: >>> >>>>>> >>> >>>>>> Dear All, >>> >>>>>> I am now needing to OCR the embossing characters on the bank >>> >>>>>> card. >>> >>>>>> These characters are in two kind of font. The first one is >>> >>>>>> Farrington 7B, >>> >>>>>> which is used to present the account number, and another font is >>> >>>>>> unknown(maybe bank-dependent) and is used to present card holder >>> >>>>>> name, card >>> >>>>>> issue time and card serial number. >>> >>>>>> Now the problem is the embossing characters are very difficult >>> >>>>>> to >>> >>>>>> OCR since they will be very bright under special light. While if >>> >>>>>> the extra >>> >>>>>> light is not applied, the card background will largely affect >>> >>>>>> these >>> >>>>>> characters, and will cause error. >>> >>>>>> I have uploaded two images. The first sample image shows that >>> >>>>>> improper light applied will cause the characters to be dark/light >>> >>>>>> mixed and >>> >>>>>> OCR result is very bad. The second image shows that a better light >>> >>>>>> will make >>> >>>>>> the background dark and embossing characters very sharp, while the >>> >>>>>> OCR >>> >>>>>> result is a little bit better, but still not good enough. >>> >>>>>> Can anybody give me some advice on the light applied, or image >>> >>>>>> pre-processing technique to improve the OCR result? Thank you all! >>> >>>>> >>> >>>>> >>> >>>>> Crazy (and expensive) idea: >>> >>>>> >>> >>>>> How about taking two or maybe four pictures of each card with the >>> >>>>> light >>> >>>>> coming low from the side on the left and right (and maybe also from >>> >>>>> top/bottom), then doing some sort of image processing combination? >>> >>>>> Hopefully >>> >>>>> if the light is low enough the background will fade out and only >>> >>>>> the various >>> >>>>> edges of the raised characters will be visible. Of course this >>> >>>>> would >>> >>>>> require some special hardware and the ability to turn a different >>> >>>>> light on >>> >>>>> for each scan. >>> >>> >>> >>> -- >>> >>> You received this message because you are subscribed to the Google >>> >>> Groups "tesseract-ocr" group. >>> >>> To post to this group, send email to [email protected] >>> >>> >>> >>> To unsubscribe from this group, send email to >>> >>> [email protected] >>> >>> >>> >>> For more options, visit this group at >>> >>> http://groups.google.com/group/tesseract-ocr?hl=en >>> >> >>> >> >>> > -- >>> > You received this message because you are subscribed to the Google >>> > Groups "tesseract-ocr" group. >>> > To post to this group, send email to [email protected] >>> > To unsubscribe from this group, send email to >>> > [email protected] >>> > For more options, visit this group at >>> > http://groups.google.com/group/tesseract-ocr?hl=en >> >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected] >> To unsubscribe from this group, send email to >> [email protected] >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en > > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

