Hi gadv,

    You can get the implementation here : 
https://github.com/aperrau/DetectText
    Good luck.

在 2012年12月15日星期六UTC+8下午10时38分31秒,gadv写道:
>
> Hello Neo, 
>
> which SWT implementation did you use? There are several ones out there 
> and I haven't found one that produces your result yet. 
>
> Thanks 
>
> On Thu, Dec 13, 2012 at 1:24 PM, Dmitri Silaev 
> <[email protected]<javascript:>> 
> wrote: 
> > Neo Song, 
> > 
> > There are two usual approaches to problems like yours. The first one is 
> to 
> > constrain the shooting conditions to mitigate further problems with the 
> > image processing. This could be indeed requiring the slip of a card (or 
> > something what others in the forum advise) and allowing everything that 
> > follows to go much easier. Another approach is to accept every possible 
> > image and struggle with all kinds of complications. 
> > 
> > If you choose the latter I can suggest the following. Your binarized 
> images 
> > need some improvement. The binarization procedure has to save more 
> contour 
> > pixels. Then you can use e.g. morphology (distance transform maybe) to 
> > obtain an image of one-pixel-wide contours - something like an edge map. 
> In 
> > that way you'll be able to replace the first stage of the SWT - edge 
> > detection (implemented via Canny or whatever) - with your own. The rest 
> of 
> > the SWT can go in its usual way. To generate a better edge image you can 
> use 
> > various binarization techniques, taking note of that you actually 
> binarize 
> > not characters themselves but those glowing halos around them. Then you 
> can 
> > employ some kind of post-processing (maybe 3x3 pattern based linking) to 
> > make your contours more connected. But be prepared that no perfect 
> character 
> > contours can be obtained, like with any other edge detection procedure. 
> > 
> > HTH and good luck! 
> > 
> > Warm regards, 
> > Dmitri Silaev 
> > www.CustomOCR.com 
> > 
> > 
> > 
> > On Thu, Dec 13, 2012 at 1:35 PM, Neo Song <[email protected]<javascript:>> 
> wrote: 
> >> 
> >> Hi gadv, 
> >> 
> >>     I have used SWT algorithm to do the pre-processing. You can find 
> this 
> >> algorithm paper and supplementation by Google. Now the problem is the 
> >> hollow-style characters and broken strokes, which is very difficult to 
> cope 
> >> with. Any good ideas? 
> >> 
> >> 在 2012年12月8日星期六UTC+8上午10时17分46秒,gadv写道: 
> >>> 
> >>> Hello Neo, 
> >>> 
> >>> how did you turn the original images to those results? What kind of 
> >>> image processing? 
> >>> 
> >>> Thanks 
> >>> 
> >>> On Thu, Dec 6, 2012 at 4:41 AM, Neo Song <[email protected]> wrote: 
> >>> > Dear SteveP and others, 
> >>> > 
> >>> >     I have managed to do some image preprocessing and now I have got 
> >>> > the 
> >>> > attachment images to send to tesseract. However they are all 
> >>> > hollow-style 
> >>> > characters, which is very bad for OCR. How can I transfer them to 
> >>> > solid-style characters? 
> >>> > 
> >>> > 在 2012年12月6日星期四UTC+8上午5时28分28秒,SteveP写道: 
> >>> >> 
> >>> >> If your characters have a fixed size in terms of pixels, then you 
> >>> >> might 
> >>> >> get better results from doing a subimage search than by using OCR. 
>  I 
> >>> >> mean 
> >>> >> searching for a rectangular subimage within the image of the card. 
> >>> >> The 
> >>> >> subimages that you could use would be reference images of each 
> digit 
> >>> >> 0-9. 
> >>> >> 
> >>> >> You would probably need to do some image processing first to 
> convert 
> >>> >> the 
> >>> >> images to black and white.  Let me know if you need ideas. 
> >>> >> 
> >>> >> On Thu, Nov 22, 2012 at 9:04 AM, Tom Morris <[email protected]> 
> wrote: 
> >>> >>> 
> >>> >>> Embossed cards are designed to be printed.  Have you considered 
> >>> >>> taking an 
> >>> >>> impression and scanning the impression?  Or just scanning 
> >>> >>> (magnetically) the 
> >>> >>> magnetic strip on the card? 
> >>> >>> 
> >>> >>> There have been other discussions of training Tesseract for OCR-A 
> >>> >>> (and I 
> >>> >>> think OCR-B).  Farrington 7B is another in that set of OCRable 
> fonts, 
> >>> >>> so the 
> >>> >>> process should be the same. 
> >>> >>> 
> >>> >>> Tom 
> >>> >>> 
> >>> >>> 
> >>> >>> On Wednesday, November 21, 2012 9:41:02 AM UTC-5, Neo Song wrote: 
> >>> >>>> 
> >>> >>>> Thank you for your reply! 
> >>> >>>> And since the bank card embossing characters are designed to be 
> >>> >>>> OCR-able(according to the ISO 7811 spec), why there is no 
> >>> >>>> implementation 
> >>> >>>> examples available on the internet? And there is no similar 
> problem 
> >>> >>>> in 
> >>> >>>> tesseract forum either. I have searched for a lot, but I find 
> >>> >>>> nothing. 
> >>> >>>> This problem should be an easy one or not? 
> >>> >>>> 
> >>> >>>> 在 2012年11月20日星期二UTC+8下午9时45分13秒,TP写道: 
> >>> >>>>> 
> >>> >>>>> On Mon, Nov 19, 2012 at 9:07 AM, Neo Song <[email protected]> 
> >>> >>>>> wrote: 
> >>> >>>>>> 
> >>> >>>>>> Dear All, 
> >>> >>>>>>     I am now needing to OCR the embossing characters on the 
> bank 
> >>> >>>>>> card. 
> >>> >>>>>> These characters are in two kind of font. The first one is 
> >>> >>>>>> Farrington 7B, 
> >>> >>>>>> which is used to present the account number, and another font 
> is 
> >>> >>>>>> unknown(maybe bank-dependent) and is used to present card 
> holder 
> >>> >>>>>> name, card 
> >>> >>>>>> issue time and card serial number. 
> >>> >>>>>>     Now the problem is the embossing characters are very 
> difficult 
> >>> >>>>>> to 
> >>> >>>>>> OCR since they will be very bright under special light. While 
> if 
> >>> >>>>>> the extra 
> >>> >>>>>> light is not applied, the card background will largely affect 
> >>> >>>>>> these 
> >>> >>>>>> characters, and will cause error. 
> >>> >>>>>>     I have uploaded two images. The first sample image shows 
> that 
> >>> >>>>>> improper light applied will cause the characters to be 
> dark/light 
> >>> >>>>>> mixed and 
> >>> >>>>>> OCR result is very bad. The second image shows that a better 
> light 
> >>> >>>>>> will make 
> >>> >>>>>> the background dark and embossing characters very sharp, while 
> the 
> >>> >>>>>> OCR 
> >>> >>>>>> result is a little bit better, but still not good enough. 
> >>> >>>>>>     Can anybody give me some advice on the light applied, or 
> image 
> >>> >>>>>> pre-processing technique to improve the OCR result? Thank you 
> all! 
> >>> >>>>> 
> >>> >>>>> 
> >>> >>>>> Crazy (and expensive) idea: 
> >>> >>>>> 
> >>> >>>>> How about taking two or maybe four pictures of each card with 
> the 
> >>> >>>>> light 
> >>> >>>>> coming low from the side on the left and right (and maybe also 
> from 
> >>> >>>>> top/bottom), then doing some sort of image processing 
> combination? 
> >>> >>>>> Hopefully 
> >>> >>>>> if the light is low enough the background will fade out and only 
> >>> >>>>> the various 
> >>> >>>>> edges of the raised characters will be visible.  Of course this 
> >>> >>>>> would 
> >>> >>>>> require some special hardware and the ability to turn a 
> different 
> >>> >>>>> light on 
> >>> >>>>> for each scan. 
> >>> >>> 
> >>> >>> -- 
> >>> >>> You received this message because you are subscribed to the Google 
> >>> >>> Groups "tesseract-ocr" group. 
> >>> >>> To post to this group, send email to [email protected] 
> >>> >>> 
> >>> >>> To unsubscribe from this group, send email to 
> >>> >>> [email protected] 
> >>> >>> 
> >>> >>> For more options, visit this group at 
> >>> >>> http://groups.google.com/group/tesseract-ocr?hl=en 
> >>> >> 
> >>> >> 
> >>> > -- 
> >>> > You received this message because you are subscribed to the Google 
> >>> > Groups "tesseract-ocr" group. 
> >>> > To post to this group, send email to [email protected] 
> >>> > To unsubscribe from this group, send email to 
> >>> > [email protected] 
> >>> > For more options, visit this group at 
> >>> > http://groups.google.com/group/tesseract-ocr?hl=en 
> >> 
> >> -- 
> >> You received this message because you are subscribed to the Google 
> >> Groups "tesseract-ocr" group. 
> >> To post to this group, send email to 
> >> [email protected]<javascript:> 
> >> To unsubscribe from this group, send email to 
> >> [email protected] <javascript:> 
> >> For more options, visit this group at 
> >> http://groups.google.com/group/tesseract-ocr?hl=en 
> > 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> > Groups "tesseract-ocr" group. 
> > To post to this group, send email to 
> > [email protected]<javascript:> 
> > To unsubscribe from this group, send email to 
> > [email protected] <javascript:> 
> > For more options, visit this group at 
> > http://groups.google.com/group/tesseract-ocr?hl=en 
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to