By the way, the fonts used in the licence plates in Argentina are not commercial. So I had to build my training image with pictures that I took with my own camera on the street. If that's your case, prepare yourself for a lot of photoshop work, to make the size of the characters uniform (tips: (paste) -> Ctrl+T (transform) -> drag the edges holding shift to keep proportions ---->when you finish with all fonts, merge visible layers (Shift+Ctrl+E) to avoid having a multilayer TIFF file------use the rulers to guide you vertically-----finally you might dicide if you want to threshold)
Question to the list: The images that I use have black background and the letters are white. I trained Tesseract for that. Does that make any difference, should I get better results by inverting the image (in the training image and captured image) ? Regards, Andres 2010/7/30 Andres <[email protected]> > Hello, > > What's the height of the characters that you are having problems with ? > But if you have not identified the font, I assume that you never trained > tesseract for it, so your problem is there. I think that you won't have good > results without training. > As Giuseppe suggested, whatthefont is the right place to go, and almost the > only one. There is another one, but it's like a guided tree, you have to > answer questions about your font shape and you never upload it. Something > similar to the guides used by botanists to identify plants based on their > leafs and stuff. Don't remember the name of the site. > This site: http://www.fontyukle.com/en/index.php doesn't charge you for > the fonts. I've found there fonts that other sites wanted to charge. > > Regarding LP, for curiosity: have you measured your detection time of the > plate ? ...with what image resolution ? > > Regards, > > Andres > > > > 2010/7/29 ZIA <[email protected]> > > Hello, >> >> Permuting may work, but haven't tried it. I am also looking for font >> sample of CA license plate, which will help me in a way that i can >> train my own >> OCR. I really don't know where can I get the sample file A to Z and 0 >> to 9 of ca license plate font. >> >> for LP extraction, i am trying to implement some kind of rectangle >> window (concept from SCW- in one paper). What i did, i applied the >> edge filter, which shows me the license plate clearly, i just need to >> extract them. one of simple approach of histogram works, if there is >> not a lot of noise, even reflection in images cause problem. >> >> On Jul 29, 5:38 am, "Jimmy O'Regan" <[email protected]> wrote: >> > On 29 July 2010 03:23, Andres <[email protected]> wrote: >> > >> > >> > >> > > Hello, >> > >> > > I'm working on the same as you, for the licence plates from Argentina, >> as I >> > > live in Argentina. >> > >> > > Same as you described, the problem was to locate the licence plate. >> > >> > > Now I'm working with the OCR and then I will work on horizontalizing >> the >> > > images, because if they are not completely horizontal, the OCR fails, >> for >> > > example today I was getting a 5 instead a of a 6. When I >> horizontalized the >> > > image with photoshop, everything turned to ok. >> > >> > > I dont know how is the layout of the positions of letters and numbers >> in >> > > California plates, are they assorted ? ...if you know if the character >> > > should be a number or a letter according to its position, you have two >> > > options (as far as I know): >> > >> > > - when recognizing char by char, tell Tesseract that you expect a >> number or >> > > a letter. I saw that in somewere inside the source code, don't >> remember >> > > where. >> > >> > You were probably looking at the code that guesses among 1, l and i >> > >> > Most of the code in the dict/ directory does some variation on this, >> > by 'permuting' the character possibilities. >> > >> > > - make your own conversion, e.g., if you are expecting a number and >> you get >> > > a G, map it to a 6, if you expect a 2 map it to a Z. >> > >> > Patrick may have more details on this approach. >> > >> > According to Wikipedia >> > (http://en.wikipedia.org/wiki/Vehicle_registration_plates_of_Argentina >> ), >> > the normal Argentinian license plates follow the template AAA 000, so >> > you could just generate the possible combinations, and use them in a >> > dawg. >> > >> > perl -e 'for $a (65..90){for $b (65..90) {for $c (65..90) {printf >> > "%c%c%c\n", $a, $b, $c;}}}' >> > perl -e 'for $a (0..9){for $b (0..9) {for $c (0..9) {printf >> > "%d%d%d\n", $a, $b, $c;}}}' >> > >> > Will get you the two lists you want. >> > >> > (For the original question, according tohttp:// >> en.wikipedia.org/wiki/Vehicle_registration_plates_of_California >> > this is the California scheme: >> > perl -e 'for $a (0..9){for $b (65..90){for $c (65..90) {for $d >> > (65..90) {for $e (0..9){for $f (0..9) {for $g (0..9) {printf >> > "%d%c%c%c%d%d%d\n", $a, $b, $c, $d, $e, $f, $g;}}}}}}}' >> > >> > > I think that I'll use the last one, I'm not on that part yet. I'm >> getting >> > > good results on images where the characters are big because of the >> distance >> > > of the camera, but in small letters (13 pixels height) things are not >> good. >> > >> > > So I have a pair of ideas to test, perhaps somebody from the group >> could >> > > give me opinions regarding them: >> > > - following the contour, with polygon approximation of the chars, >> making an >> > > image with that contours and running Tesseract on that image (trained >> for >> > > that) >> > >> > Seems reasonable. Something like autotrace or potrace might be useful. >> > >> > > - make an image with my font (one of each from the alphabet), and >> repeating >> > > the alphabet with different levels of threshold. I think that >> internally >> > > Tesseract thresholds the images. Hard to explain this, but I think >> that it >> > > may improve the quality. >> > >> > Yes, Tesseract internally thresholds the image. I think Google did >> > something like this in the Tesseract 3 language packs, so it might be >> > worth doing. >> > >> > >> > >> > > If you want to continue speaking about specifics of licence plate >> > > recognition, we can continue privately because it's off topic. I'm >> > >> > Well, you've earned my applause for recognising that, but if your >> > conversation turns up information that will save someone some time >> > later on, I'm all for it. >> > >> > >> > >> > > interested in continuing. There are many things to speak about, for >> example, >> > > the prices of the cameras, light filters, times of execution, etc. >> > >> > > You can write me to andrej100 at gmail >> > >> > > Regards, >> > >> > > Andres >> > >> > > 2010/7/28 ZIA <[email protected]> >> > >> > >> I am writing a license plate recognition application in C#. I am >> > >> almost done, i have started work on my own OCR,but then I decided to >> > >> use tessearact-ocr, which now partially works. I provide the >> > >> california license plate to ocr, but some of the font, it doesn't >> > >> recognizes, for example, like "Z" becomes number 2, letter "O" >> becomes >> > >> "U", and number 4 becomes something else. Any suggestion? any >> language >> > >> file or font file that will solve this issue. Beside that in complex >> > >> images, i am having hard time to locate License plate. but my concern >> > >> is now on ocr, since i thought i would save time by using tesseract >> > >> then writing my own neural network. I would really appreciate any >> > >> ideas or suggestions. >> > >> > >> -- >> > >> You received this message because you are subscribed to the Google >> Groups >> > >> "tesseract-ocr" group. >> > >> To post to this group, send email to [email protected]. >> > >> To unsubscribe from this group, send email to >> > >> [email protected]<tesseract-ocr%[email protected]> >> . >> > >> For more options, visit this group at >> > >>http://groups.google.com/group/tesseract-ocr?hl=en. >> > >> > > -- >> > > You received this message because you are subscribed to the Google >> Groups >> > > "tesseract-ocr" group. >> > > To post to this group, send email to [email protected]. >> > > To unsubscribe from this group, send email to >> > > [email protected]<tesseract-ocr%[email protected]> >> . >> > > For more options, visit this group at >> > >http://groups.google.com/group/tesseract-ocr?hl=en. >> > >> > -- >> > <Leftmost> jimregan, that's because deep inside you, you are evil. >> > <Leftmost> Also not-so-deep inside you. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]<tesseract-ocr%[email protected]> >> . >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en. >> >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

