I think that multiple lights sources will help you to get a more homogeneus shadows, and prevent character break, is just an idea,
Regards El lunes, 17 de diciembre de 2012 05:02:08 UTC+1, Neo Song escribió: > > Dear Dmitri, > > There is one thing that confuses me heavily. For a Coaxial light > source, I can get solid stroke characters. For a Ring light source, I can > get hollow style stroke characters. For tesseract, it can recognize much > better on solid characters than hollow characters. However currently we > have very bad result when we use coaxial light on a light background card > or UV printing card, since both the characters and the background all very > bright. And the ring light source will yield much better result under such > situations. > Note that no matter the solid stroke or hollow stroke, we all have > break stroke, so just filling the hollow stroke to form hollow stroke is > very difficult to realize. > Can you give me some advice on this situation? > > 在 2012年12月13日星期四UTC+8下午7时24分59秒,Dmitri Silaev写道: >> >> Neo Song, >> >> There are two usual approaches to problems like yours. The first one is >> to constrain the shooting conditions to mitigate further problems with the >> image processing. This could be indeed requiring the slip of a card (or >> something what others in the forum advise) and allowing everything that >> follows to go much easier. Another approach is to accept every possible >> image and struggle with all kinds of complications. >> >> If you choose the latter I can suggest the following. Your binarized >> images need some improvement. The binarization procedure has to save more >> contour pixels. Then you can use e.g. morphology (distance transform maybe) >> to obtain an image of one-pixel-wide contours - something like an edge map. >> In that way you'll be able to replace the first stage of the SWT - edge >> detection (implemented via Canny or whatever) - with your own. The rest of >> the SWT can go in its usual way. To generate a better edge image you can >> use various binarization techniques, taking note of that you actually >> binarize not characters themselves but those glowing halos around them. >> Then you can employ some kind of post-processing (maybe 3x3 pattern based >> linking) to make your contours more connected. But be prepared that no >> perfect character contours can be obtained, like with any other edge >> detection procedure. >> >> HTH and good luck! >> >> Warm regards, >> Dmitri Silaev >> www.CustomOCR.com >> >> >> >> On Thu, Dec 13, 2012 at 1:35 PM, Neo Song <[email protected]> wrote: >> >>> Hi gadv, >>> >>> I have used SWT algorithm to do the pre-processing. You can find >>> this algorithm paper and supplementation by Google. Now the problem is the >>> hollow-style characters and broken strokes, which is very difficult to cope >>> with. Any good ideas? >>> >>> 在 2012年12月8日星期六UTC+8上午10时17分46秒,gadv写道: >>>> >>>> Hello Neo, >>>> >>>> how did you turn the original images to those results? What kind of >>>> image processing? >>>> >>>> Thanks >>>> >>>> On Thu, Dec 6, 2012 at 4:41 AM, Neo Song <[email protected]> wrote: >>>> > Dear SteveP and others, >>>> > >>>> > I have managed to do some image preprocessing and now I have got >>>> the >>>> > attachment images to send to tesseract. However they are all >>>> hollow-style >>>> > characters, which is very bad for OCR. How can I transfer them to >>>> > solid-style characters? >>>> > >>>> > 在 2012年12月6日星期四UTC+8上午5时28分28秒,**SteveP写道: >>>> >> >>>> >> If your characters have a fixed size in terms of pixels, then you >>>> might >>>> >> get better results from doing a subimage search than by using OCR. >>>> I mean >>>> >> searching for a rectangular subimage within the image of the card. >>>> The >>>> >> subimages that you could use would be reference images of each digit >>>> 0-9. >>>> >> >>>> >> You would probably need to do some image processing first to convert >>>> the >>>> >> images to black and white. Let me know if you need ideas. >>>> >> >>>> >> On Thu, Nov 22, 2012 at 9:04 AM, Tom Morris <[email protected]> >>>> wrote: >>>> >>> >>>> >>> Embossed cards are designed to be printed. Have you considered >>>> taking an >>>> >>> impression and scanning the impression? Or just scanning >>>> (magnetically) the >>>> >>> magnetic strip on the card? >>>> >>> >>>> >>> There have been other discussions of training Tesseract for OCR-A >>>> (and I >>>> >>> think OCR-B). Farrington 7B is another in that set of OCRable >>>> fonts, so the >>>> >>> process should be the same. >>>> >>> >>>> >>> Tom >>>> >>> >>>> >>> >>>> >>> On Wednesday, November 21, 2012 9:41:02 AM UTC-5, Neo Song wrote: >>>> >>>> >>>> >>>> Thank you for your reply! >>>> >>>> And since the bank card embossing characters are designed to be >>>> >>>> OCR-able(according to the ISO 7811 spec), why there is no >>>> implementation >>>> >>>> examples available on the internet? And there is no similar >>>> problem in >>>> >>>> tesseract forum either. I have searched for a lot, but I find >>>> nothing. >>>> >>>> This problem should be an easy one or not? >>>> >>>> >>>> >>>> 在 2012年11月20日星期二UTC+8下午9时45分13秒,**TP写道: >>>> >>>>> >>>> >>>>> On Mon, Nov 19, 2012 at 9:07 AM, Neo Song <[email protected]> >>>> wrote: >>>> >>>>>> >>>> >>>>>> Dear All, >>>> >>>>>> I am now needing to OCR the embossing characters on the bank >>>> card. >>>> >>>>>> These characters are in two kind of font. The first one is >>>> Farrington 7B, >>>> >>>>>> which is used to present the account number, and another font is >>>> >>>>>> unknown(maybe bank-dependent) and is used to present card holder >>>> name, card >>>> >>>>>> issue time and card serial number. >>>> >>>>>> Now the problem is the embossing characters are very >>>> difficult to >>>> >>>>>> OCR since they will be very bright under special light. While if >>>> the extra >>>> >>>>>> light is not applied, the card background will largely affect >>>> these >>>> >>>>>> characters, and will cause error. >>>> >>>>>> I have uploaded two images. The first sample image shows >>>> that >>>> >>>>>> improper light applied will cause the characters to be >>>> dark/light mixed and >>>> >>>>>> OCR result is very bad. The second image shows that a better >>>> light will make >>>> >>>>>> the background dark and embossing characters very sharp, while >>>> the OCR >>>> >>>>>> result is a little bit better, but still not good enough. >>>> >>>>>> Can anybody give me some advice on the light applied, or >>>> image >>>> >>>>>> pre-processing technique to improve the OCR result? Thank you >>>> all! >>>> >>>>> >>>> >>>>> >>>> >>>>> Crazy (and expensive) idea: >>>> >>>>> >>>> >>>>> How about taking two or maybe four pictures of each card with the >>>> light >>>> >>>>> coming low from the side on the left and right (and maybe also >>>> from >>>> >>>>> top/bottom), then doing some sort of image processing >>>> combination? Hopefully >>>> >>>>> if the light is low enough the background will fade out and only >>>> the various >>>> >>>>> edges of the raised characters will be visible. Of course this >>>> would >>>> >>>>> require some special hardware and the ability to turn a different >>>> light on >>>> >>>>> for each scan. >>>> >>> >>>> >>> -- >>>> >>> You received this message because you are subscribed to the Google >>>> >>> Groups "tesseract-ocr" group. >>>> >>> To post to this group, send email to [email protected] >>>> >>> >>>> >>> To unsubscribe from this group, send email to >>>> >>> tesseract-oc...@googlegroups.**com >>>> >>> >>>> >>> For more options, visit this group at >>>> >>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en> >>>> >>> >>>> >> >>>> >> >>>> > -- >>>> > You received this message because you are subscribed to the Google >>>> > Groups "tesseract-ocr" group. >>>> > To post to this group, send email to [email protected] >>>> > To unsubscribe from this group, send email to >>>> > tesseract-oc...@**googlegroups.com >>>> > For more options, visit this group at >>>> > http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en> >>>> > >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To post to this group, send email to [email protected] >>> To unsubscribe from this group, send email to >>> [email protected] >>> For more options, visit this group at >>> http://groups.google.com/group/tesseract-ocr?hl=en >>> >> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

