Re: California License Plate font issues with OCR

Andres Fri, 30 Jul 2010 12:17:36 -0700

Hello,

What's the height of the characters that you are having problems with ?
But if you have not identified the font, I assume that you never trained
tesseract for it, so your problem is there. I think that you won't have good
results without training.
As Giuseppe suggested, whatthefont is the right place to go, and almost the
only one. There is another one, but it's like a guided tree, you have to
answer questions about your font shape and you never upload it. Something
similar to the guides used by botanists to identify plants based on their
leafs and stuff. Don't remember the name of the site.
This site: http://www.fontyukle.com/en/index.php doesn't charge you for the
fonts. I've found there fonts that other sites wanted to charge.


Regarding LP, for curiosity: have you measured your detection time of the
plate ? ...with what image resolution ?

Regards,

Andres



2010/7/29 ZIA <[email protected]>

> Hello,
>
> Permuting may work, but haven't tried it. I am also looking for font
> sample of CA license plate, which will help me in a way that i can
> train my own
> OCR. I really don't know where can I get the sample file A to Z and 0
> to 9 of ca license plate font.
>
> for LP extraction, i am trying to implement some kind of rectangle
> window (concept from SCW- in one paper). What i did, i applied the
> edge filter, which shows me the license plate clearly, i just need to
> extract them. one of simple approach of histogram works, if there is
> not a lot of noise, even reflection in images cause problem.
>
> On Jul 29, 5:38 am, "Jimmy O'Regan" <[email protected]> wrote:
> > On 29 July 2010 03:23, Andres <[email protected]> wrote:
> >
> >
> >
> > > Hello,
> >
> > > I'm working on the same as you, for the licence plates from Argentina,
> as I
> > > live in Argentina.
> >
> > > Same as you described, the problem was to locate the licence plate.
> >
> > > Now I'm working with the OCR and then I will work on horizontalizing
> the
> > > images, because if they are not completely horizontal, the OCR fails,
> for
> > > example today I was getting a 5 instead a of a 6. When I horizontalized
> the
> > > image with photoshop, everything turned to ok.
> >
> > > I dont know how is the layout of the positions of letters and numbers
> in
> > > California plates, are they assorted ? ...if you know if the character
> > > should be a number or a letter according to its position, you have two
> > > options (as far as I know):
> >
> > > - when recognizing char by char, tell Tesseract that you expect a
> number or
> > > a letter. I saw that in somewere inside the source code, don't remember
> > > where.
> >
> > You were probably looking at the code that guesses among 1, l and i
> >
> > Most of the code in the dict/ directory does some variation on this,
> > by 'permuting' the character possibilities.
> >
> > > - make your own conversion, e.g., if you are expecting a number and you
> get
> > > a G, map it to a 6, if you expect a 2 map it to a Z.
> >
> > Patrick may have more details on this approach.
> >
> > According to Wikipedia
> > (http://en.wikipedia.org/wiki/Vehicle_registration_plates_of_Argentina),
> > the normal Argentinian license plates follow the template AAA 000, so
> > you could just generate the possible combinations, and use them in a
> > dawg.
> >
> >  perl -e 'for $a (65..90){for $b (65..90) {for $c (65..90) {printf
> > "%c%c%c\n", $a, $b, $c;}}}'
> >  perl -e 'for $a (0..9){for $b (0..9) {for $c (0..9) {printf
> > "%d%d%d\n", $a, $b, $c;}}}'
> >
> > Will get you the two lists you want.
> >
> > (For the original question, according tohttp://
> en.wikipedia.org/wiki/Vehicle_registration_plates_of_California
> > this is the California scheme:
> > perl -e 'for $a (0..9){for $b (65..90){for $c (65..90) {for $d
> > (65..90) {for $e (0..9){for $f (0..9) {for $g (0..9) {printf
> > "%d%c%c%c%d%d%d\n", $a, $b, $c, $d, $e, $f, $g;}}}}}}}'
> >
> > > I think that I'll use the last one, I'm not on that part yet. I'm
> getting
> > > good results on images where the characters are big because of the
> distance
> > > of the camera, but in small letters (13 pixels height) things are not
> good.
> >
> > > So I have a pair of ideas to test, perhaps somebody from the group
> could
> > > give me opinions regarding them:
> > > - following the contour, with polygon approximation of the chars,
> making an
> > > image with that contours and running Tesseract on that image (trained
> for
> > > that)
> >
> > Seems reasonable. Something like autotrace or potrace might be useful.
> >
> > > - make an image with my font (one of each from the alphabet), and
> repeating
> > > the alphabet with different levels of threshold. I think that
> internally
> > > Tesseract thresholds the images. Hard to explain this, but I think that
> it
> > > may improve the quality.
> >
> > Yes, Tesseract internally thresholds the image. I think Google did
> > something like this in the Tesseract 3 language packs, so it might be
> > worth doing.
> >
> >
> >
> > > If you want to continue speaking about specifics of licence plate
> > > recognition, we can continue privately because it's off topic. I'm
> >
> > Well, you've earned my applause for recognising that, but if your
> > conversation turns up information that will save someone some time
> > later on, I'm all for it.
> >
> >
> >
> > > interested in continuing. There are many things to speak about, for
> example,
> > > the prices of the cameras, light filters, times of execution, etc.
> >
> > > You can write me to andrej100 at gmail
> >
> > > Regards,
> >
> > > Andres
> >
> > > 2010/7/28 ZIA <[email protected]>
> >
> > >> I am writing a license plate recognition application in C#. I am
> > >> almost done, i have started work on my own OCR,but then I decided to
> > >> use tessearact-ocr, which now partially works. I provide the
> > >> california license plate to ocr, but some of the font, it doesn't
> > >> recognizes, for example, like "Z" becomes number 2, letter "O" becomes
> > >> "U", and number 4 becomes something else. Any suggestion? any language
> > >> file or font file that will solve this issue. Beside that in complex
> > >> images, i am having hard time to locate License plate. but my concern
> > >> is now on ocr, since i thought i would save time by using tesseract
> > >> then writing my own neural network. I would really appreciate any
> > >> ideas or suggestions.
> >
> > >> --
> > >> You received this message because you are subscribed to the Google
> Groups
> > >> "tesseract-ocr" group.
> > >> To post to this group, send email to [email protected].
> > >> To unsubscribe from this group, send email to
> > >> [email protected]<tesseract-ocr%[email protected]>
> .
> > >> For more options, visit this group at
> > >>http://groups.google.com/group/tesseract-ocr?hl=en.
> >
> > > --
> > > You received this message because you are subscribed to the Google
> Groups
> > > "tesseract-ocr" group.
> > > To post to this group, send email to [email protected].
> > > To unsubscribe from this group, send email to
> > > [email protected]<tesseract-ocr%[email protected]>
> .
> > > For more options, visit this group at
> > >http://groups.google.com/group/tesseract-ocr?hl=en.
> >
> > --
> > <Leftmost> jimregan, that's because deep inside you, you are evil.
> > <Leftmost> Also not-so-deep inside you.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected]<tesseract-ocr%[email protected]>
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Re: California License Plate font issues with OCR

Reply via email to