[tesseract-ocr] Re: High Error rate even if good quality image and low noise

Alex Szeto Fri, 01 Apr 2016 01:57:39 -0700

Thank you for your advises. I am working on Opencv for my project actually.


a: can I have more detail on how to advantage of symbols and their specific 
shapes?
   I have used whitelist in tesseract options to eliminate some impossible 
results.
    Recently I have used opencv to make the font thinner (make it more like 
normal font), and result is improved for character like '8' , however for 
'0', it still have 50% chance getting 'U'. I really have no clue why it 
gets a U instead of 'D' ('O' is eliminated).

b: unfortunately, in my case, Hong Kong license plate have no fixed 
ordering of character/number, so no prior knowledge like this can be used. 

On Thursday, March 31, 2016 at 6:43:11 AM UTC+8, Tom Morris wrote:
>
> On Wednesday, March 30, 2016 at 11:34:14 AM UTC-4, Alex Szeto wrote:
>>
>> I am working on a license plate recognition project, I have trouble in 
>> improve accuracy of OCR.
>> Attached is one of the image I used and the result is very poor.
>>
>> version of tesseract : 3.0.3
>> The command that I used : tesseract Untitled.jpg out -psm 9
>> The result is : SXUSBBB  while I am expecting for 5X0S888
>> I have did some experiments and I have found some character pairs are 
>> easily get confused by tesseract.
>> for example :  '0' become 'U' ; '5' and 'S' ; 'B' and '8'
>>
>> Is there some methods or parameters I can set so the result can be 
>> improved? 
>>
>
> Looking at the image and result, it's pretty easy to see what the 
> confusion is, particularly for a recognizer tuned to deal with a wide 
> variety of fonts, and given the fact that you're not attempting to 
> recognize actual words, but arbitrary strings of symbols.
>
> Have you considered building something on OpenCV or a similar tool where 
> you could take advantage of a) the very small number of symbols and their 
> specific shapes and b) knowledge of the specific ordering of numbers and 
> letters plus any other domain knowledge that's available.
>
> Tom 
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/929a7069-653a-46e1-a4a7-64418c2b41eb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: High Error rate even if good quality image and low noise

Reply via email to