Thank you for your advises. I am working on Opencv for my project actually.
a: can I have more detail on how to advantage of symbols and their specific
shapes?
I have used whitelist in tesseract options to eliminate some impossible
results.
Recently I have used opencv to make the font thinner (make it more like
normal font), and result is improved for character like '8' , however for
'0', it still have 50% chance getting 'U'. I really have no clue why it
gets a U instead of 'D' ('O' is eliminated).
b: unfortunately, in my case, Hong Kong license plate have no fixed
ordering of character/number, so no prior knowledge like this can be used.
On Thursday, March 31, 2016 at 6:43:11 AM UTC+8, Tom Morris wrote:
>
> On Wednesday, March 30, 2016 at 11:34:14 AM UTC-4, Alex Szeto wrote:
>>
>> I am working on a license plate recognition project, I have trouble in
>> improve accuracy of OCR.
>> Attached is one of the image I used and the result is very poor.
>>
>> version of tesseract : 3.0.3
>> The command that I used : tesseract Untitled.jpg out -psm 9
>> The result is : SXUSBBB while I am expecting for 5X0S888
>> I have did some experiments and I have found some character pairs are
>> easily get confused by tesseract.
>> for example : '0' become 'U' ; '5' and 'S' ; 'B' and '8'
>>
>> Is there some methods or parameters I can set so the result can be
>> improved?
>>
>
> Looking at the image and result, it's pretty easy to see what the
> confusion is, particularly for a recognizer tuned to deal with a wide
> variety of fonts, and given the fact that you're not attempting to
> recognize actual words, but arbitrary strings of symbols.
>
> Have you considered building something on OpenCV or a similar tool where
> you could take advantage of a) the very small number of symbols and their
> specific shapes and b) knowledge of the specific ordering of numbers and
> letters plus any other domain knowledge that's available.
>
> Tom
>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/929a7069-653a-46e1-a4a7-64418c2b41eb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.