On 30 July 2010 19:26, Andres <[email protected]> wrote: > Hello Jimmy, > > Thank you for your message. > > I'm writing between your lines: > > 2010/7/29 Jimmy O'Regan <[email protected]> >> >> On 29 July 2010 03:23, Andres <[email protected]> wrote: >> > Hello, >> > >> > I'm working on the same as you, for the licence plates from Argentina, >> > as I >> > live in Argentina. >> > >> > Same as you described, the problem was to locate the licence plate. >> > >> > Now I'm working with the OCR and then I will work on horizontalizing the >> > images, because if they are not completely horizontal, the OCR fails, >> > for >> > example today I was getting a 5 instead a of a 6. When I horizontalized >> > the >> > image with photoshop, everything turned to ok. >> > >> > I dont know how is the layout of the positions of letters and numbers in >> > California plates, are they assorted ? ...if you know if the character >> > should be a number or a letter according to its position, you have two >> > options (as far as I know): >> > >> > - when recognizing char by char, tell Tesseract that you expect a number >> > or >> > a letter. I saw that in somewere inside the source code, don't remember >> > where. >> >> You were probably looking at the code that guesses among 1, l and i > > I think that I saw somewhere that it was possible to configure that you > expect numbers or letters, but I'm not sure anymore. >
Yeah, there's that too. >> >> Most of the code in the dict/ directory does some variation on this, >> by 'permuting' the character possibilities. >> >> > - make your own conversion, e.g., if you are expecting a number and you >> > get >> > a G, map it to a 6, if you expect a 2 map it to a Z. >> > >> >> Patrick may have more details on this approach. >> >> According to Wikipedia >> (http://en.wikipedia.org/wiki/Vehicle_registration_plates_of_Argentina), >> the normal Argentinian license plates follow the template AAA 000, so >> you could just generate the possible combinations, and use them in a >> dawg. >> >> perl -e 'for $a (65..90){for $b (65..90) {for $c (65..90) {printf >> "%c%c%c\n", $a, $b, $c;}}}' >> perl -e 'for $a (0..9){for $b (0..9) {for $c (0..9) {printf >> "%d%d%d\n", $a, $b, $c;}}}' >> >> Will get you the two lists you want. >> > Thank you very much for this idea. > The resulting set of words (in the case of the six characters) would have a > size of 17,576,000 lines. > How is the access that makes tesseract to this ? Isn't it too big for that ? > It'll probably hit the dawg size limit, but you can change it. >> >> (For the original question, according to >> http://en.wikipedia.org/wiki/Vehicle_registration_plates_of_California >> this is the California scheme: >> perl -e 'for $a (0..9){for $b (65..90){for $c (65..90) {for $d >> (65..90) {for $e (0..9){for $f (0..9) {for $g (0..9) {printf >> "%d%c%c%c%d%d%d\n", $a, $b, $c, $d, $e, $f, $g;}}}}}}}' >> >> > I think that I'll use the last one, I'm not on that part yet. I'm >> > getting >> > good results on images where the characters are big because of the >> > distance >> > of the camera, but in small letters (13 pixels height) things are not >> > good. >> > >> > So I have a pair of ideas to test, perhaps somebody from the group could >> > give me opinions regarding them: >> > - following the contour, with polygon approximation of the chars, making >> > an >> > image with that contours and running Tesseract on that image (trained >> > for >> > that) >> >> Seems reasonable. Something like autotrace or potrace might be useful. >> > Glad to read that. Since I use OpenCV I usually use cvFindContours() > function and then cvApproxPoly() > >> >> > - make an image with my font (one of each from the alphabet), and >> > repeating >> > the alphabet with different levels of threshold. I think that internally >> > Tesseract thresholds the images. Hard to explain this, but I think that >> > it >> > may improve the quality. >> >> Yes, Tesseract internally thresholds the image. I think Google did >> something like this in the Tesseract 3 language packs, so it might be >> worth doing. >> > Do you know if it uses automatic threshold levels or if there is some place > to configure it ? > The preset is in a variable. I'll dig around for it when I get a chance. >> >> > >> > If you want to continue speaking about specifics of licence plate >> > recognition, we can continue privately because it's off topic. I'm >> >> Well, you've earned my applause for recognising that, but if your >> conversation turns up information that will save someone some time >> later on, I'm all for it. >> > great, I will be glad to share if something good appears. > -- <Leftmost> jimregan, that's because deep inside you, you are evil. <Leftmost> Also not-so-deep inside you. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

