Re: California License Plate font issues with OCR

Andres Fri, 30 Jul 2010 11:27:09 -0700

Hello Jimmy,

Thank you for your message.


I'm writing between your lines:

2010/7/29 Jimmy O'Regan <[email protected]>

> On 29 July 2010 03:23, Andres <[email protected]> wrote:
> > Hello,
> >
> > I'm working on the same as you, for the licence plates from Argentina, as
> I
> > live in Argentina.
> >
> > Same as you described, the problem was to locate the licence plate.
> >
> > Now I'm working with the OCR and then I will work on horizontalizing the
> > images, because if they are not completely horizontal, the OCR fails, for
> > example today I was getting a 5 instead a of a 6. When I horizontalized
> the
> > image with photoshop, everything turned to ok.
> >
> > I dont know how is the layout of the positions of letters and numbers in
> > California plates, are they assorted ? ...if you know if the character
> > should be a number or a letter according to its position, you have two
> > options (as far as I know):
> >
> > - when recognizing char by char, tell Tesseract that you expect a number
> or
> > a letter. I saw that in somewere inside the source code, don't remember
> > where.
>
> You were probably looking at the code that guesses among 1, l and i
>

I think that I saw somewhere that it was possible to configure that you
expect numbers or letters, but I'm not sure anymore.


>
> Most of the code in the dict/ directory does some variation on this,
> by 'permuting' the character possibilities.
>
> > - make your own conversion, e.g., if you are expecting a number and you
> get
> > a G, map it to a 6, if you expect a 2 map it to a Z.
> >
>
> Patrick may have more details on this approach.
>
> According to Wikipedia
> (http://en.wikipedia.org/wiki/Vehicle_registration_plates_of_Argentina),
> the normal Argentinian license plates follow the template AAA 000, so
> you could just generate the possible combinations, and use them in a
> dawg.
>
>  perl -e 'for $a (65..90){for $b (65..90) {for $c (65..90) {printf
> "%c%c%c\n", $a, $b, $c;}}}'
>  perl -e 'for $a (0..9){for $b (0..9) {for $c (0..9) {printf
> "%d%d%d\n", $a, $b, $c;}}}'
>
> Will get you the two lists you want.
>
> Thank you very much for this idea.
The resulting set of words (in the case of the six characters) would have a
size of 17,576,000 lines.
How is the access that makes tesseract to this ? Isn't it too big for that ?


> (For the original question, according to
> http://en.wikipedia.org/wiki/Vehicle_registration_plates_of_California
> this is the California scheme:
> perl -e 'for $a (0..9){for $b (65..90){for $c (65..90) {for $d
> (65..90) {for $e (0..9){for $f (0..9) {for $g (0..9) {printf
> "%d%c%c%c%d%d%d\n", $a, $b, $c, $d, $e, $f, $g;}}}}}}}'
>
> > I think that I'll use the last one, I'm not on that part yet. I'm getting
> > good results on images where the characters are big because of the
> distance
> > of the camera, but in small letters (13 pixels height) things are not
> good.
> >
> > So I have a pair of ideas to test, perhaps somebody from the group could
> > give me opinions regarding them:
> > - following the contour, with polygon approximation of the chars, making
> an
> > image with that contours and running Tesseract on that image (trained for
> > that)
>
> Seems reasonable. Something like autotrace or potrace might be useful.
>
> Glad to read that. Since I use OpenCV I usually use cvFindContours()
function and then cvApproxPoly()


> > - make an image with my font (one of each from the alphabet), and
> repeating
> > the alphabet with different levels of threshold. I think that internally
> > Tesseract thresholds the images. Hard to explain this, but I think that
> it
> > may improve the quality.
>
> Yes, Tesseract internally thresholds the image. I think Google did
> something like this in the Tesseract 3 language packs, so it might be
> worth doing.
>
> Do you know if it uses automatic threshold levels or if there is some place
to configure it ?


> >
> > If you want to continue speaking about specifics of licence plate
> > recognition, we can continue privately because it's off topic. I'm
>
> Well, you've earned my applause for recognising that, but if your
> conversation turns up information that will save someone some time
> later on, I'm all for it.
>
> great, I will be glad to share if something good appears.


> > interested in continuing. There are many things to speak about, for
> example,
> > the prices of the cameras, light filters, times of execution, etc.
> >
> > You can write me to andrej100 at gmail
> >
> > Regards,
> >
> > Andres
> >
> >
> >
> > 2010/7/28 ZIA <[email protected]>
> >>
> >> I am writing a license plate recognition application in C#. I am
> >> almost done, i have started work on my own OCR,but then I decided to
> >> use tessearact-ocr, which now partially works. I provide the
> >> california license plate to ocr, but some of the font, it doesn't
> >> recognizes, for example, like "Z" becomes number 2, letter "O" becomes
> >> "U", and number 4 becomes something else. Any suggestion? any language
> >> file or font file that will solve this issue. Beside that in complex
> >> images, i am having hard time to locate License plate. but my concern
> >> is now on ocr, since i thought i would save time by using tesseract
> >> then writing my own neural network. I would really appreciate any
> >> ideas or suggestions.
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> Groups
> >> "tesseract-ocr" group.
> >> To post to this group, send email to [email protected].
> >> To unsubscribe from this group, send email to
> >> [email protected]<tesseract-ocr%[email protected]>
> .
> >> For more options, visit this group at
> >> http://groups.google.com/group/tesseract-ocr?hl=en.
> >>
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "tesseract-ocr" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to
> > [email protected]<tesseract-ocr%[email protected]>
> .
> > For more options, visit this group at
> > http://groups.google.com/group/tesseract-ocr?hl=en.
> >
>
>
>
> --
> <Leftmost> jimregan, that's because deep inside you, you are evil.
> <Leftmost> Also not-so-deep inside you.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected]<tesseract-ocr%[email protected]>
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Re: California License Plate font issues with OCR

Reply via email to