Re: California License Plate font issues with OCR

Jimmy O'Regan Fri, 30 Jul 2010 12:34:22 -0700

On 30 July 2010 19:26, Andres <[email protected]> wrote:
> Hello Jimmy,
>
> Thank you for your message.
>
> I'm writing between your lines:
>
> 2010/7/29 Jimmy O'Regan <[email protected]>
>>
>> On 29 July 2010 03:23, Andres <[email protected]> wrote:
>> > Hello,
>> >
>> > I'm working on the same as you, for the licence plates from Argentina,
>> > as I
>> > live in Argentina.
>> >
>> > Same as you described, the problem was to locate the licence plate.
>> >
>> > Now I'm working with the OCR and then I will work on horizontalizing the
>> > images, because if they are not completely horizontal, the OCR fails,
>> > for
>> > example today I was getting a 5 instead a of a 6. When I horizontalized
>> > the
>> > image with photoshop, everything turned to ok.
>> >
>> > I dont know how is the layout of the positions of letters and numbers in
>> > California plates, are they assorted ? ...if you know if the character
>> > should be a number or a letter according to its position, you have two
>> > options (as far as I know):
>> >
>> > - when recognizing char by char, tell Tesseract that you expect a number
>> > or
>> > a letter. I saw that in somewere inside the source code, don't remember
>> > where.
>>
>> You were probably looking at the code that guesses among 1, l and i
>
> I think that I saw somewhere that it was possible to configure that you
> expect numbers or letters, but I'm not sure anymore.
>


Yeah, there's that too.

>>
>> Most of the code in the dict/ directory does some variation on this,
>> by 'permuting' the character possibilities.
>>
>> > - make your own conversion, e.g., if you are expecting a number and you
>> > get
>> > a G, map it to a 6, if you expect a 2 map it to a Z.
>> >
>>
>> Patrick may have more details on this approach.
>>
>> According to Wikipedia
>> (http://en.wikipedia.org/wiki/Vehicle_registration_plates_of_Argentina),
>> the normal Argentinian license plates follow the template AAA 000, so
>> you could just generate the possible combinations, and use them in a
>> dawg.
>>
>>  perl -e 'for $a (65..90){for $b (65..90) {for $c (65..90) {printf
>> "%c%c%c\n", $a, $b, $c;}}}'
>>  perl -e 'for $a (0..9){for $b (0..9) {for $c (0..9) {printf
>> "%d%d%d\n", $a, $b, $c;}}}'
>>
>> Will get you the two lists you want.
>>
> Thank you very much for this idea.
> The resulting set of words (in the case of the six characters) would have a
> size of 17,576,000 lines.
> How is the access that makes tesseract to this ? Isn't it too big for that ?
>

It'll probably hit the dawg size limit, but you can change it.

>>
>> (For the original question, according to
>> http://en.wikipedia.org/wiki/Vehicle_registration_plates_of_California
>> this is the California scheme:
>> perl -e 'for $a (0..9){for $b (65..90){for $c (65..90) {for $d
>> (65..90) {for $e (0..9){for $f (0..9) {for $g (0..9) {printf
>> "%d%c%c%c%d%d%d\n", $a, $b, $c, $d, $e, $f, $g;}}}}}}}'
>>
>> > I think that I'll use the last one, I'm not on that part yet. I'm
>> > getting
>> > good results on images where the characters are big because of the
>> > distance
>> > of the camera, but in small letters (13 pixels height) things are not
>> > good.
>> >
>> > So I have a pair of ideas to test, perhaps somebody from the group could
>> > give me opinions regarding them:
>> > - following the contour, with polygon approximation of the chars, making
>> > an
>> > image with that contours and running Tesseract on that image (trained
>> > for
>> > that)
>>
>> Seems reasonable. Something like autotrace or potrace might be useful.
>>
> Glad to read that. Since I use OpenCV I usually use cvFindContours()
> function and then cvApproxPoly()
>
>>
>> > - make an image with my font (one of each from the alphabet), and
>> > repeating
>> > the alphabet with different levels of threshold. I think that internally
>> > Tesseract thresholds the images. Hard to explain this, but I think that
>> > it
>> > may improve the quality.
>>
>> Yes, Tesseract internally thresholds the image. I think Google did
>> something like this in the Tesseract 3 language packs, so it might be
>> worth doing.
>>
> Do you know if it uses automatic threshold levels or if there is some place
> to configure it ?
>

The preset is in a variable. I'll dig around for it when I get a chance.

>>
>> >
>> > If you want to continue speaking about specifics of licence plate
>> > recognition, we can continue privately because it's off topic. I'm
>>
>> Well, you've earned my applause for recognising that, but if your
>> conversation turns up information that will save someone some time
>> later on, I'm all for it.
>>
> great, I will be glad to share if something good appears.
>



-- 
<Leftmost> jimregan, that's because deep inside you, you are evil.
<Leftmost> Also not-so-deep inside you.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Re: California License Plate font issues with OCR

Reply via email to