Re: California License Plate font issues with OCR

Jimmy O'Regan Thu, 29 Jul 2010 05:38:34 -0700

On 29 July 2010 03:23, Andres <[email protected]> wrote:
> Hello,
>
> I'm working on the same as you, for the licence plates from Argentina, as I
> live in Argentina.
>
> Same as you described, the problem was to locate the licence plate.
>
> Now I'm working with the OCR and then I will work on horizontalizing the
> images, because if they are not completely horizontal, the OCR fails, for
> example today I was getting a 5 instead a of a 6. When I horizontalized the
> image with photoshop, everything turned to ok.
>
> I dont know how is the layout of the positions of letters and numbers in
> California plates, are they assorted ? ...if you know if the character
> should be a number or a letter according to its position, you have two
> options (as far as I know):
>
> - when recognizing char by char, tell Tesseract that you expect a number or
> a letter. I saw that in somewere inside the source code, don't remember
> where.


You were probably looking at the code that guesses among 1, l and i

Most of the code in the dict/ directory does some variation on this,
by 'permuting' the character possibilities.

> - make your own conversion, e.g., if you are expecting a number and you get
> a G, map it to a 6, if you expect a 2 map it to a Z.
>

Patrick may have more details on this approach.

According to Wikipedia
(http://en.wikipedia.org/wiki/Vehicle_registration_plates_of_Argentina),
the normal Argentinian license plates follow the template AAA 000, so
you could just generate the possible combinations, and use them in a
dawg.

 perl -e 'for $a (65..90){for $b (65..90) {for $c (65..90) {printf
"%c%c%c\n", $a, $b, $c;}}}'
 perl -e 'for $a (0..9){for $b (0..9) {for $c (0..9) {printf
"%d%d%d\n", $a, $b, $c;}}}'

Will get you the two lists you want.

(For the original question, according to
http://en.wikipedia.org/wiki/Vehicle_registration_plates_of_California
this is the California scheme:
perl -e 'for $a (0..9){for $b (65..90){for $c (65..90) {for $d
(65..90) {for $e (0..9){for $f (0..9) {for $g (0..9) {printf
"%d%c%c%c%d%d%d\n", $a, $b, $c, $d, $e, $f, $g;}}}}}}}'

> I think that I'll use the last one, I'm not on that part yet. I'm getting
> good results on images where the characters are big because of the distance
> of the camera, but in small letters (13 pixels height) things are not good.
>
> So I have a pair of ideas to test, perhaps somebody from the group could
> give me opinions regarding them:
> - following the contour, with polygon approximation of the chars, making an
> image with that contours and running Tesseract on that image (trained for
> that)

Seems reasonable. Something like autotrace or potrace might be useful.

> - make an image with my font (one of each from the alphabet), and repeating
> the alphabet with different levels of threshold. I think that internally
> Tesseract thresholds the images. Hard to explain this, but I think that it
> may improve the quality.

Yes, Tesseract internally thresholds the image. I think Google did
something like this in the Tesseract 3 language packs, so it might be
worth doing.

>
> If you want to continue speaking about specifics of licence plate
> recognition, we can continue privately because it's off topic. I'm

Well, you've earned my applause for recognising that, but if your
conversation turns up information that will save someone some time
later on, I'm all for it.

> interested in continuing. There are many things to speak about, for example,
> the prices of the cameras, light filters, times of execution, etc.
>
> You can write me to andrej100 at gmail
>
> Regards,
>
> Andres
>
>
>
> 2010/7/28 ZIA <[email protected]>
>>
>> I am writing a license plate recognition application in C#. I am
>> almost done, i have started work on my own OCR,but then I decided to
>> use tessearact-ocr, which now partially works. I provide the
>> california license plate to ocr, but some of the font, it doesn't
>> recognizes, for example, like "Z" becomes number 2, letter "O" becomes
>> "U", and number 4 becomes something else. Any suggestion? any language
>> file or font file that will solve this issue. Beside that in complex
>> images, i am having hard time to locate License plate. but my concern
>> is now on ocr, since i thought i would save time by using tesseract
>> then writing my own neural network. I would really appreciate any
>> ideas or suggestions.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected].
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>



-- 
<Leftmost> jimregan, that's because deep inside you, you are evil.
<Leftmost> Also not-so-deep inside you.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Re: California License Plate font issues with OCR

Reply via email to