Re: California License Plate font issues with OCR

Andres Fri, 30 Jul 2010 12:45:23 -0700

By the way, the fonts used in the licence plates in Argentina are not
commercial. So I had to build my training image with pictures that I took
with my own camera on the street. If that's your case, prepare yourself for
a lot of photoshop work, to make the size of the characters uniform (tips:
(paste) -> Ctrl+T (transform) -> drag the edges holding shift to keep
proportions ---->when you finish with all fonts, merge visible layers
(Shift+Ctrl+E) to avoid having a multilayer TIFF file------use the rulers to
guide you vertically-----finally you might dicide if you want to threshold)


Question to the list:
The images that I use have black background and the letters are white. I
trained Tesseract for that. Does that make any difference, should I get
better results by inverting the image (in the training image and captured
image) ?

Regards,

Andres


2010/7/30 Andres <[email protected]>

> Hello,
>
> What's the height of the characters that you are having problems with ?
> But if you have not identified the font, I assume that you never trained
> tesseract for it, so your problem is there. I think that you won't have good
> results without training.
> As Giuseppe suggested, whatthefont is the right place to go, and almost the
> only one. There is another one, but it's like a guided tree, you have to
> answer questions about your font shape and you never upload it. Something
> similar to the guides used by botanists to identify plants based on their
> leafs and stuff. Don't remember the name of the site.
> This site: http://www.fontyukle.com/en/index.php doesn't charge you for
> the fonts. I've found there fonts that other sites wanted to charge.
>
> Regarding LP, for curiosity: have you measured your detection time of the
> plate ? ...with what image resolution ?
>
> Regards,
>
> Andres
>
>
>
> 2010/7/29 ZIA <[email protected]>
>
> Hello,
>>
>> Permuting may work, but haven't tried it. I am also looking for font
>> sample of CA license plate, which will help me in a way that i can
>> train my own
>> OCR. I really don't know where can I get the sample file A to Z and 0
>> to 9 of ca license plate font.
>>
>> for LP extraction, i am trying to implement some kind of rectangle
>> window (concept from SCW- in one paper). What i did, i applied the
>> edge filter, which shows me the license plate clearly, i just need to
>> extract them. one of simple approach of histogram works, if there is
>> not a lot of noise, even reflection in images cause problem.
>>
>> On Jul 29, 5:38 am, "Jimmy O'Regan" <[email protected]> wrote:
>> > On 29 July 2010 03:23, Andres <[email protected]> wrote:
>> >
>> >
>> >
>> > > Hello,
>> >
>> > > I'm working on the same as you, for the licence plates from Argentina,
>> as I
>> > > live in Argentina.
>> >
>> > > Same as you described, the problem was to locate the licence plate.
>> >
>> > > Now I'm working with the OCR and then I will work on horizontalizing
>> the
>> > > images, because if they are not completely horizontal, the OCR fails,
>> for
>> > > example today I was getting a 5 instead a of a 6. When I
>> horizontalized the
>> > > image with photoshop, everything turned to ok.
>> >
>> > > I dont know how is the layout of the positions of letters and numbers
>> in
>> > > California plates, are they assorted ? ...if you know if the character
>> > > should be a number or a letter according to its position, you have two
>> > > options (as far as I know):
>> >
>> > > - when recognizing char by char, tell Tesseract that you expect a
>> number or
>> > > a letter. I saw that in somewere inside the source code, don't
>> remember
>> > > where.
>> >
>> > You were probably looking at the code that guesses among 1, l and i
>> >
>> > Most of the code in the dict/ directory does some variation on this,
>> > by 'permuting' the character possibilities.
>> >
>> > > - make your own conversion, e.g., if you are expecting a number and
>> you get
>> > > a G, map it to a 6, if you expect a 2 map it to a Z.
>> >
>> > Patrick may have more details on this approach.
>> >
>> > According to Wikipedia
>> > (http://en.wikipedia.org/wiki/Vehicle_registration_plates_of_Argentina
>> ),
>> > the normal Argentinian license plates follow the template AAA 000, so
>> > you could just generate the possible combinations, and use them in a
>> > dawg.
>> >
>> >  perl -e 'for $a (65..90){for $b (65..90) {for $c (65..90) {printf
>> > "%c%c%c\n", $a, $b, $c;}}}'
>> >  perl -e 'for $a (0..9){for $b (0..9) {for $c (0..9) {printf
>> > "%d%d%d\n", $a, $b, $c;}}}'
>> >
>> > Will get you the two lists you want.
>> >
>> > (For the original question, according tohttp://
>> en.wikipedia.org/wiki/Vehicle_registration_plates_of_California
>> > this is the California scheme:
>> > perl -e 'for $a (0..9){for $b (65..90){for $c (65..90) {for $d
>> > (65..90) {for $e (0..9){for $f (0..9) {for $g (0..9) {printf
>> > "%d%c%c%c%d%d%d\n", $a, $b, $c, $d, $e, $f, $g;}}}}}}}'
>> >
>> > > I think that I'll use the last one, I'm not on that part yet. I'm
>> getting
>> > > good results on images where the characters are big because of the
>> distance
>> > > of the camera, but in small letters (13 pixels height) things are not
>> good.
>> >
>> > > So I have a pair of ideas to test, perhaps somebody from the group
>> could
>> > > give me opinions regarding them:
>> > > - following the contour, with polygon approximation of the chars,
>> making an
>> > > image with that contours and running Tesseract on that image (trained
>> for
>> > > that)
>> >
>> > Seems reasonable. Something like autotrace or potrace might be useful.
>> >
>> > > - make an image with my font (one of each from the alphabet), and
>> repeating
>> > > the alphabet with different levels of threshold. I think that
>> internally
>> > > Tesseract thresholds the images. Hard to explain this, but I think
>> that it
>> > > may improve the quality.
>> >
>> > Yes, Tesseract internally thresholds the image. I think Google did
>> > something like this in the Tesseract 3 language packs, so it might be
>> > worth doing.
>> >
>> >
>> >
>> > > If you want to continue speaking about specifics of licence plate
>> > > recognition, we can continue privately because it's off topic. I'm
>> >
>> > Well, you've earned my applause for recognising that, but if your
>> > conversation turns up information that will save someone some time
>> > later on, I'm all for it.
>> >
>> >
>> >
>> > > interested in continuing. There are many things to speak about, for
>> example,
>> > > the prices of the cameras, light filters, times of execution, etc.
>> >
>> > > You can write me to andrej100 at gmail
>> >
>> > > Regards,
>> >
>> > > Andres
>> >
>> > > 2010/7/28 ZIA <[email protected]>
>> >
>> > >> I am writing a license plate recognition application in C#. I am
>> > >> almost done, i have started work on my own OCR,but then I decided to
>> > >> use tessearact-ocr, which now partially works. I provide the
>> > >> california license plate to ocr, but some of the font, it doesn't
>> > >> recognizes, for example, like "Z" becomes number 2, letter "O"
>> becomes
>> > >> "U", and number 4 becomes something else. Any suggestion? any
>> language
>> > >> file or font file that will solve this issue. Beside that in complex
>> > >> images, i am having hard time to locate License plate. but my concern
>> > >> is now on ocr, since i thought i would save time by using tesseract
>> > >> then writing my own neural network. I would really appreciate any
>> > >> ideas or suggestions.
>> >
>> > >> --
>> > >> You received this message because you are subscribed to the Google
>> Groups
>> > >> "tesseract-ocr" group.
>> > >> To post to this group, send email to [email protected].
>> > >> To unsubscribe from this group, send email to
>> > >> [email protected]<tesseract-ocr%[email protected]>
>> .
>> > >> For more options, visit this group at
>> > >>http://groups.google.com/group/tesseract-ocr?hl=en.
>> >
>> > > --
>> > > You received this message because you are subscribed to the Google
>> Groups
>> > > "tesseract-ocr" group.
>> > > To post to this group, send email to [email protected].
>> > > To unsubscribe from this group, send email to
>> > > [email protected]<tesseract-ocr%[email protected]>
>> .
>> > > For more options, visit this group at
>> > >http://groups.google.com/group/tesseract-ocr?hl=en.
>> >
>> > --
>> > <Leftmost> jimregan, that's because deep inside you, you are evil.
>> > <Leftmost> Also not-so-deep inside you.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected]<tesseract-ocr%[email protected]>
>> .
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Re: California License Plate font issues with OCR

Reply via email to