> 2010/7/30 Jimmy O'Regan <[email protected]> > > On 30 July 2010 19:26, Andres <[email protected]> wrote: >> > Hello Jimmy, >> > >> > Thank you for your message. >> > >> > I'm writing between your lines: >> > >> > 2010/7/29 Jimmy O'Regan <[email protected]> >> >> >> >> On 29 July 2010 03:23, Andres <[email protected]> wrote: >> >> > Hello, >> >> > >> >> > I'm working on the same as you, for the licence plates from >> Argentina, >> >> > as I >> >> > live in Argentina. >> >> > >> >> > Same as you described, the problem was to locate the licence plate. >> >> > >> >> > Now I'm working with the OCR and then I will work on horizontalizing >> the >> >> > images, because if they are not completely horizontal, the OCR fails, >> >> > for >> >> > example today I was getting a 5 instead a of a 6. When I >> horizontalized >> >> > the >> >> > image with photoshop, everything turned to ok. >> >> > >> >> > I dont know how is the layout of the positions of letters and numbers >> in >> >> > California plates, are they assorted ? ...if you know if the >> character >> >> > should be a number or a letter according to its position, you have >> two >> >> > options (as far as I know): >> >> > >> >> > - when recognizing char by char, tell Tesseract that you expect a >> number >> >> > or >> >> > a letter. I saw that in somewere inside the source code, don't >> remember >> >> > where. >> >> >> >> You were probably looking at the code that guesses among 1, l and i >> > >> > I think that I saw somewhere that it was possible to configure that you >> > expect numbers or letters, but I'm not sure anymore. >> > >> >> Yeah, there's that too. >> >> >> >> >> Most of the code in the dict/ directory does some variation on this, >> >> by 'permuting' the character possibilities. >> >> >> >> > - make your own conversion, e.g., if you are expecting a number and >> you >> >> > get >> >> > a G, map it to a 6, if you expect a 2 map it to a Z. >> >> > >> >> >> >> Patrick may have more details on this approach. >> >> >> >> According to Wikipedia >> >> (http://en.wikipedia.org/wiki/Vehicle_registration_plates_of_Argentina >> ), >> >> the normal Argentinian license plates follow the template AAA 000, so >> >> you could just generate the possible combinations, and use them in a >> >> dawg. >> >> >> >> perl -e 'for $a (65..90){for $b (65..90) {for $c (65..90) {printf >> >> "%c%c%c\n", $a, $b, $c;}}}' >> >> perl -e 'for $a (0..9){for $b (0..9) {for $c (0..9) {printf >> >> "%d%d%d\n", $a, $b, $c;}}}' >> >> >> >> Will get you the two lists you want. >> >> >> > Thank you very much for this idea. >> > The resulting set of words (in the case of the six characters) would >> have a >> > size of 17,576,000 lines. >> > How is the access that makes tesseract to this ? Isn't it too big for >> that ? >> > >> >> It'll probably hit the dawg size limit, but you can change it. >> > Do you know anything about the access time ? I can't figure out if Tess should access this using a constant time algorithm or not.
> >> >> > >> >> >> (For the original question, according to >> >> http://en.wikipedia.org/wiki/Vehicle_registration_plates_of_California >> >> this is the California scheme: >> >> perl -e 'for $a (0..9){for $b (65..90){for $c (65..90) {for $d >> >> (65..90) {for $e (0..9){for $f (0..9) {for $g (0..9) {printf >> >> "%d%c%c%c%d%d%d\n", $a, $b, $c, $d, $e, $f, $g;}}}}}}}' >> >> >> >> > I think that I'll use the last one, I'm not on that part yet. I'm >> >> > getting >> >> > good results on images where the characters are big because of the >> >> > distance >> >> > of the camera, but in small letters (13 pixels height) things are not >> >> > good. >> >> > >> >> > So I have a pair of ideas to test, perhaps somebody from the group >> could >> >> > give me opinions regarding them: >> >> > - following the contour, with polygon approximation of the chars, >> making >> >> > an >> >> > image with that contours and running Tesseract on that image (trained >> >> > for >> >> > that) >> >> >> >> Seems reasonable. Something like autotrace or potrace might be useful. >> >> >> > Glad to read that. Since I use OpenCV I usually use cvFindContours() >> > function and then cvApproxPoly() >> > >> >> >> >> > - make an image with my font (one of each from the alphabet), and >> >> > repeating >> >> > the alphabet with different levels of threshold. I think that >> internally >> >> > Tesseract thresholds the images. Hard to explain this, but I think >> that >> >> > it >> >> > may improve the quality. >> >> >> >> Yes, Tesseract internally thresholds the image. I think Google did >> >> something like this in the Tesseract 3 language packs, so it might be >> >> worth doing. >> >> >> > Do you know if it uses automatic threshold levels or if there is some >> place >> > to configure it ? >> > >> >> The preset is in a variable. I'll dig around for it when I get a chance. >> >> That's great. Thank you. >> >> > >> >> >> > >> >> > If you want to continue speaking about specifics of licence plate >> >> > recognition, we can continue privately because it's off topic. I'm >> >> >> >> Well, you've earned my applause for recognising that, but if your >> >> conversation turns up information that will save someone some time >> >> later on, I'm all for it. >> >> >> > great, I will be glad to share if something good appears. >> > >> >> >> >> -- >> <Leftmost> jimregan, that's because deep inside you, you are evil. >> <Leftmost> Also not-so-deep inside you. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

