On Wednesday, March 30, 2016 at 11:34:14 AM UTC-4, Alex Szeto wrote: > > I am working on a license plate recognition project, I have trouble in > improve accuracy of OCR. > Attached is one of the image I used and the result is very poor. > > version of tesseract : 3.0.3 > The command that I used : tesseract Untitled.jpg out -psm 9 > The result is : SXUSBBB while I am expecting for 5X0S888 > I have did some experiments and I have found some character pairs are > easily get confused by tesseract. > for example : '0' become 'U' ; '5' and 'S' ; 'B' and '8' > > Is there some methods or parameters I can set so the result can be > improved? >
Looking at the image and result, it's pretty easy to see what the confusion is, particularly for a recognizer tuned to deal with a wide variety of fonts, and given the fact that you're not attempting to recognize actual words, but arbitrary strings of symbols. Have you considered building something on OpenCV or a similar tool where you could take advantage of a) the very small number of symbols and their specific shapes and b) knowledge of the specific ordering of numbers and letters plus any other domain knowledge that's available. Tom -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c3335dbd-d631-458d-a196-fb172a65ebcd%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

